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2.  Summary  of  Technical  Progress 

During  the  past  year,  our  research  was  directed  towards  discovering  a  set  of  design 
principles  and  developing  efficient  algorithms  for  distributed  real-time  systems  and 
databases.  Our  efforts  have  been  concentrated  on  three  main  area:  real-time  transaction 
processing,  fault-tolerant  multiprocessor  scheduling,  experimental  systems  and 
prototyping  tools. 

2.1.  Real-Time  Transactions 

One  of  the  most  important  achievements  in  this  project  is  the  development  of  new 
scheduling  algorithms  based  on  the  idea  of  adjusting  the  serialization  c^er  of  active 
transactions  dynamically.  This  is  the  first  successful  attempt  to  integrate  benefits  of  the 
pessimistic  and  optimistic  approaches  for  transaction  sch^uling.  Two  algorithms  are 
developed  based  on  the  notion  of  dynamic  serialization  to  control  blocking  and  aborting 
in  a  more  effective  manner.  One  is  based  on  a  priority-locking  mechanism  that  uses  the 
phase-dependent  control  of  optimistic  approach,  while  the  other  is  based  on  dynamic 
timestamp  allocation.  We  have  implemented  the  first  lock-based  algorithm  using  the 
Starlite  environment  for  performance  evaluation.  When  compared  with  conventional 
transaction  scheduling  algorithms,  it  significantly  improves  the  percentage  of  high 
priority  transactions  that  meet  the  deadline.  Furthermore,  it  is  shown  that  the  algorithm 
provides  a  very  high  discriminating  power  which  enables  the  system  to  support  higher 
priority  transactions  at  the  expense  of  lower  priority  ones  when  a  transient  overload 
occurs.  In  addition,  we  have  evaluated  optimistic  concurrency  control  protocols  for  real¬ 
time  database  systems.  Our  results  indicate  that  optimistic  or  hybrid  approaches  may 
outperform  the  pessimistic  approach  in  a  wide  operational  range. 

We  also  have  developed  algorithms  for  resource  management  in  distributed  real¬ 
time  systems.  They  are  priority-ordered  deadlock  avoidance  algorithms,  efficient 
deadlock  detection/resolution  algorithms  using  partial  resource  allocation  graphs,  and  a 
synchronization  scheme  for  replicated  critical  data  in  distributed  real-time  database 
systems.  Those  algorithms  are  very  efficient  for  distributed  real-time  systems,  in  which 
critical  resources  should  be  managed  to  support  consistency,  while  satisfying  timing 
constraints.  Especially  for  replication  control,  we  have  employed  a  new  consistency 
criterion,  less  stringent  than  conventional  one-copy  serializability.  This  scheme  is  very 
flexible  and  practical,  because  no  prior  knowledge  of  the  data  requirements  or  the 
execution  time  of  each  transaction  is  required.  Using  our  StarLite  prototyping 
environment,  we  have  implemented  those  algorithms  and  demonstrated  that  they  provide 
higher  level  of  concurrency  and  greater  flexibility  in  meeting  timing  requirements. 
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12..  Fault'ToIerant  Multiprocessor  Scheduling 

To  investigate  feasible  solutions  for  scheduling  real-time  tasks  in 
parallel/distributed  environments,  we  have  developed  a  new  paradigm  for  multiprocessor 
real-time  systems,  and  implemented  a  parallel  programming  interface  based  on  our 
paradigm.  Our  new  panuUgm  has  created  new  research  opportunides  for  operating 
systems  and  databases  for  parallel  computing  systems  with  timing  and  fault-tolerance 
requirements.  For  example,  using  the  new  programming  interface,  we  have  developed 
PRDB,  an  experimental  real-time  database  system  that  mns  on  an  emulated  tightly- 
coupled  multiprocessor  system  in  the  Starlite  environment.  It  provides  a  genei^ 
paradigm  for  exploiting  parallelism  and  different  real-time  scheduling  policies.  This 
experimental  system  has  been  used  for  investigating  implementation  techniques  for 
parallel  database  systems  and  the  impact  of  multiprocessor  technology  on  operating 
systems  design. 

To  support  both  real-time  and  fault-tolerance  requirements,  an  algorithm  to 
schedule  a  number  of  tasks  with  their  timing  and  precedence  constraints  on  a  number  of 
processors  is  necessary.  We  have  developed  a  scheduling  model  under  which  timing  and 
fault-tolerance  constraints  can  be  expressed.  Using  this  model,  a  scheduling  problem  to 
tolerate  one  arbitrary  task  error  or  processor  failure  has  been  studied.  Since  most 
multiprocessor  scheduling  problems  are  NP-complete,  we  have  developed  heuristics  to 
obtain  near-optimal  solutions  to  the  problem.  We  assume  that  all  the  critical  tasks  are 
periodic,  and  they  have  hard  deadlines.  We  use  two  versions  of  each  critical  task,  one  as 
the  primary  task  and  the  other  as  the  secondary.  The  scheduling  algorithm  is  based  on  the 
first-fit  decreasing  bin  packing  heuristics.  Using  the  Starlite  environment,  the  algorithm 
was  implemented  and  its  performance  was  evaluated.  It  was  shown  that  the  algorithm 
poforms  very  well,  finding  the  optimal  solution  most  of  the  time. 

23.  Experimental  Systems  and  Prototyping  Tools 

We  have  developed  a  suite  of  database  systems  on  several  platforms,  such  as 
StarLite,  ARTS,  and  UNIX,  and  utilized  them  as  system  integration  testbeds.  Since  a 
real-time  system  must  operate  in  the  context  of  operating  system  services,  correct 
functioning  and  timing  behavior  of  the  system  depends  heavily  on  the  operating  system 
interfaces.  We  have  developed  a  multi-thread  database  server,  called  RTDB,  for  ARTS 
real-time  operating  system  kernel.  The  RTDB  now  supports  application  programmatic 
interface  and  graphic  user  interface.  The  application  programmatic  interface  (API) 
provides  an  easy  way  for  the  database  application  programmer  to  construct  batch  clients. 
The  API  currently  provides  Create,  Insert,  Select,  and  Update.  With  imprecise  server,  a 
client  can  specify  a  deadline  by  which  a  computation  (query)  must  complete.  If  the  server 
is  unable  to  complete  the  entire  query,  the  server  will  return  imprecise  result,  provided 
the  computation  had  proceeded  to  a  point  where  the  output  would  be  meaningful  and 
appropriate.  One  problem  that  hinders  the  transformation  of  a  non-real-time  database 
function  to  a  real-time  one  for  imprecise  server  is  recursion.  Recursive  function  are  not 
amenable  to  being  stopped  as  easily  as  iterative  functions.  To  implement  the  imprecise 
server,  we  have  used  Ae  state  machine  approach  in  representing  the  execution  stages  of 
each  function.  Necessary  actions  are  performed  with  a  measurable  amount  of  time 
allotted  to  each  stage  of  execution. 
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In  addition,  we  have  developed  a  separate  experimental  system,  called  MRDB,  a 
real-time  database  kernel  running  on  SunAJnix  environment.  In  MRDB,  servers  and 
clients  can  be  created  and  removed  dynamically.  The  servers  use  valid  time  attribute  and 
run-time  estimate  of  requests  in  transaction  scheduling  to  reduce  the  number  of 
deadline-missing  transactions.  Using  MRDB,  we  have  performed  several  experiments  to 
evaluate  design  alternatives  in  real-time  scheduling  and  concurrency  control.  The 
temporal  database  kernel  on  SunAJnix  environment  is  transported  to  IBM  RS/6000  with 
ADC.  Ada  programming  interface  is  then  developed  to  support  a  set  of  basic  access 
functions  to  the  database.  We  have  simulated  RT-DOSE  (Real-time  Distributed 
Operating  System  Experiments)  using  the  interface. 

Our  experimental  systems  achieve  other  goal  of  this  project — to  transfer  technology 
developed  under  the  StarLite  project  to  Navy,  DoD,  and  other  research  organizations. 
Currently,  Naval  Ocean  Systems  Center  in  San  Diego,  California,  is  using  RTDB  for 
their  distributed  real-time  experiments. 
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Systems,"  Database  Systems  for  Next-Genaration  Applications  -  Principles  and 
Practice,  W.  Kim,  Y.  Kambayashi,  and  I.  Paik  (eds.).  World  Scientific  Publishing, 
1993  (to  appear). 

(2)  S.  H.  Son,  C.  Chang,  and  Y.  Kim,  "Performance  Evaluation  of  Real-Time  Locking 
Protocols,"  Database  Systems  for  Next-Genaration  Applications  -  Principles  and 
Practice,  W,  Kim,  Y.  Kambayashi,  and  I.  Paik  (eds.).  World  Scientific  Publishing, 
1993  (to  appear). 

(3)  S.  H.  Son  and  S.  Park,  "Scheduling  Transactions  for  Distributed  Time-Critical 
Applications,"  in  Advances  in  Distributed  Systems,  T.  Casavant  and  M.  Singhal 
(Mtors),  IEEE  Computer  Society,  1992(to  appear). 

(4)  S.  H.  Son,  R.  Cook,  J.  Lee,  and  H.  Oh,  "New  Paradigms  for  Real-Time  Database 
Systems,"  in  "Real-Time  Programming,"  K.  Ramamritham  and  W.  Halang 
(Mtors),  Pergamon  Press,  1992. 

(5)  R.  Cook,  L.  Hsu,  and  S,  H.  Son,  "Real-Time,  Priority-Ordered,  Deadlock 
Avoidance  Algorithms,"  in  Foundations  of  Real-Time  Computing:  Scheduling  and 
Resource  Management,  A.  Van  Tilborg  and  G.  M.  Koob  (Editors),  Kluwer 
Academic  Publishers,  1991,  pp  307-324. 

(6)  S.  H.  Son,  Y.  Lin,  and  R.  Cook,  "Concurrency  Control  in  Real-Time  Database 
Systems,"  in  Foundations  of  Real-Time  Computing:  Scheduling  and  Resource 
Management,  A.  Van  Tilborg  and  G.  M.  Koob  (Editors),  Kluwer  Academic 
Publishers,  1991,  pp  185-202. 

(7)  R.  P.  Cook,  "The  StarLite  Operating  System,"  Operating  Systems  for  Mission- 
Critical  Computing,  K.  Gordon,  P.  Hwang,  and  A.  Agrawala  (Editors),  ACM 
Press,  1991. 


6 


•  Refereed  Journal  Publications 
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Distributed  Database  Research,"  Journal  of  Computer  Simulation,  (to  appear). 

(9)  S.  H.  Son,  J.  Lee,  and  Y.  Lin,  "Hybrid  Protocols  using  Dynamic  Adjustment  of 
Serialization  Order  for  Real-Time  Concurrency  Control,"  Journal  of  Real-Time 
Systems,  1992,  vol.  4,  no.  3,  pp  269-276. 

(10)  S.  H.  Son,  "Scheduling  Real-Time  Transactions  using  Priority,"  Information  and 
Software  Technology,  vol.  34,  no.  6,  June  1992,  pp  409-415. 
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5.  Software  and  Hardware  Prototypes 

The  RTDB  real-time  database  system  has  been  upgraded  and  delivered  to  NRaD. 
However,  we  still  have  a  tremendous  amount  of  work  to  do  in  fixing  minor  problems  and 
identifying  performance  bottlenecks.  The  StarLite  prototyping  environment  has  been 
distributed  to  several  universities  as  beta  test  sites.  Both  RTDB  and  StarLite  still  need  a 
lot  of  work  for  providing  proper  documentation. 
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Abstract 

Database  systems  for  real-time  applications  must 
satisfy  timing  constraints  associated  with  transactions,  in 
addition  to  maintaining  the  consistency  of  data.  In  this 
paper  we  examine  a  priority-driven  locking  protocol 
called  integrated  real-time  locking  protocol.  We  show 
that  this  protocol  is  free  of  deadlock,  and  in  addition,  a 
high  pricaity  transaction  is  not  blocked  by  uncommitted 
lower  priority  transactions.  The  protocol  does  not  assume 
any  knowledge  about  the  data  requirements  or  the  execu¬ 
tion  time  of  each  transaction.  This  makes  the  protocol 
widely  tqtpiicable,  since  in  many  actual  environments 
such  infomuuitm  may  not  be  lesdily  avaiable.  Using  a 
database  prototyping  environment,  it  is  shown  that,  the 
proposed  protocol  offers  performance  improvement  over 
the  two-phase  locking  protocol. 


1.  Introductkm 

Real-time  database  systems  (RTDBS)  are  uansac- 
tion  processing  systems  where  transactions  have  explicit 
timing  constraints.  Typically,  a  timing  constraint  is 
expressed  in  the  form  of  a  deadline,  a  certain  time  in  the 
future  by  which  a  transaction  needs  to  be  completed.  A 
deadline  is  said  to  be  hard  if  it  cannot  be  miss^  or  else 
the  result  is  useless.  If  a  deadline  can  be  missed,  it  is  a 
saft  deadline.  With  soft  deadlines,  the  usefulness  of  a 
ic^t  may  decrease  after  the  deadline  is  missed.  In 
RTDBS,  the  correctness  of  transaction  processing 
dqiends  not  only  on  maintaining  consistency  constraints 
and  producing  correct  results,  but  also  on  the  time  at 
which  a  transaction  is  completed.  Transactions  must  be 
scheduled  in  such  a  way  that  they  can  be  completed 
before  their  corresponding  deadlines  expire.  For  exam¬ 
ple.  both  the  update  and  query  on  the  tracking  data  for  a 
missile  must  be  processed  within  given  deadlines. 
RTDBS  are  becoming  increasingly  important  in  a  wide 
range  of  applications,  such  as  computer  integrated 
manufacturing,  traffic  conu-ol  systems,  robotics,  and  in 
stock  market  hading. 
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Conventional  data  models  and  databases  are  not 
adequate  for  time-critical  applications,  since  they  are  not 
designed  to  provide  features  required  to  support  real-time 
transactions.  They  are  designed  to  provide  good  average 
performance,  while  possibly  yielding  unacceptable 
worst-case  response  times.  Very  few  of  them  allow  users 
to  specify  or  ensure  timing  cmistraints.  Recently,  interest 
in  this  new  a|q>lication  domain  is  growing  in  database 
community  [AbbSS,  Abb89,  Haf90.  Kor90.  Lin90.  Sha88. 
Sha91,  SonSSb,  Son91]. 

While  the  theories  of  concurrency  control  in  data¬ 
base  systems  and  real-time  task  scheduling  have  both 
advanced,  little  attention  has  been  paid  to  the  interaction 
between  concurrency  control  protocols  and  real-time 
scheduling  algorithms  (Stan88].  In  database  concurrency 
control,  meeting  the  d^line  is  typically  not  addressed. 
The  objective  is  to  provide  a  high  degree  of  concurrency 
and  thus  faster  average  response  time  without  violating 
data  consistency.  In  real-time  task  scheduling,  on  the 
other  hand,  it  is  customary  to  assume  that  tasks  are 
independent  The  objective  here  is  to  maximize 
resources,  such  as  CPU  utilization,  subject  to  meeting 
timing  ctmsttaints.  In  addition,  it  is  assumed  that  the 
resource  and  data  requirements  of  tasks  are  known. 
Scheduling  in  RTDBS  is  a  combination  of  the  two 
scheduling  mechanisms. 

Real-time  task  scheduling  methods  can  be  extended 
for  real-time  transaction  scheduling  while  concurrency 
control  protocols  are  still  needed  for  operation  scheduling 
to  maintain  data  consistency.  The  general  approach  is  to 
utilize  existing  concurrency  control  protocol^  especially 
two-phase  locking  (2PL),  and  to  apply  time-oitical  tran¬ 
saction  scheduling  methods  that  favor  more  urgent  tran¬ 
sactions  [Abb88.  Sha9l].  Such  approaches  have  the 
inherent  disadvantage  of  being  limited  by  the  con¬ 
currency  control  method  upon  which  they  are  based, 
since  all  existing  concurrency  control  methods  synchron¬ 
ize  concurrent  data  access  of  transactions  by  the  combi¬ 
nation  of  two  measures:  blocking  and  roll-backs  of  tran¬ 
sactions.  Both  are  barriers  to  meeting  time<ritical 
schedules. 

Concurrency  control  protocols  induce  a  serializa¬ 
tion  order  among  conflicting  transactions.  In  non-real¬ 
time  concurrency  control  protocols,  timing  constraints  are 


not  a  factor  in  the  construction  of  this  order.  This  is  obvi¬ 
ously  a  drawback  for  RTDBS.  The  conservative  2PL 
uses  blocking,  but  in  RTDBS,  blocking  may  cause  prior¬ 
ity  inversion.  Priority  inversion  is  said  to  occur  when  a 
high  priority  transaction  is  blocked  by  lower  priority  tran¬ 
sactions  [ShaSS].  The  alternative  is  to  abort  low  priority 
transactions  when  priority  inversion  occurs.  This  wastes 
the  work  done  by  the  aborted  transactions  and  in  turn  also 
has  a  negative  effect  on  time-critical  scheduling. 

Satisfying  the  timing  constraints  while  preserving 
data  consistency  requires  the  concurrency  control  algo¬ 
rithms  to  accommodate  timeliness  of  uansactions  as  well 
as  to  maintain  data  consistency.  This  is  the  very  goal  of 
our  worit.  If  the  information  about  data  requirements  and 
execution  time  of  each  transaction  is  available  before¬ 
hand,  off-line  preanalysis  can  be  performed  to  avoid 
conflicts  [Sha91].  However,  such  approaches  may  delay 
the  starting  of  some  transactions,  even  if  they  have  high 
priorities,  and  may  reduce  the  concurrency  level  in  the 
system.  This,  in  return,  may  lead  to  the  violation  of  the 
timing  constraints  and  degrade  system  performance. 

In  this  paper  we  examine  a  priority-driven  two- 
phase  locking  protocol  called  the  integrated  real-time 
tocking  protocol.  It  is  an  integrated  locking  protocol, 
since  it  decomposes  the  problem  of  concurrency  control 
into  two  subproblems,  namely  read-write  synchronization 
and  write-write  synchronization,  and  integrates  the  solu¬ 
tions  to  two  subproblems  consistently  to  yield  a  correct 
solution  to  the  entire  problem  [Bern87].  We  show  that 
this  protocol  is  free  of  deadlock.  The  protocol  is  similar 
to  o^mistic  concurrency  control  protocols  [KungSl]  in 
the  sense  that  each  transaction  has  three  phases,  but 
unlike  the  optimistic  approach,  there  is  no  validation 
phase.  While  other  optimistic  concurrency  control  proto¬ 
cols  resolve  conflicts  in  the  validation  phase,  this  protocol 
resolves  them  in  the  read  phase  using  transaction  priority. 

The  remainder  of  this  paper  is  organized  as  fol¬ 
lows.  The  details  of  the  locking  protocol  are  described  in 
Section  2.  The  properties  of  the  protocol  is  discussed  in 
Section  3.  Section  4  presents  performance  evaluation  of 
the  real-time  locking  protocol.  Finally,  concluding 
remarks  appear  in  Section  S. 

2.  The  Integrated  Real-Time  Locking  Protocol 

2.1.  Basic  Concepts 

A  RTDBS  is  often  used  by  applications  such  as 
tracking.  Since  we  cannot  predict  how  many  objects  need 
to  be  tracked  and  when  they  appear,  we  assume  randomly 
arriving  transactions.  Each  transaction  is  assigned  an  ini- 
tiai  priority  and  a  start-timestamp  when  it  is  submitted  to 
the  system.  The  initial  priority  can  be  based  on  the  dead¬ 
line  and  the  criticality  of  the  transaction.  The  start- 
timestamp  is  appended  to  the  initial  priority  to  form  the 
actual  priority  that  is  used  in  scheduling.  When  we  refer 


to  the  priority  of  a  uansaction,  we  always  mean  the  actual 
priority  with  the  stait-timestamp  appended.  Since  the 
start-timestamp  is  unique,  so  is  the  priority  of  each  tran¬ 
saction.  The  priority  of  transactions  with  the  same  initial 
priority  is  distinguished  by  their  start-timestamps. 

With  two-phase  locking  and  priority  assignment, 
we  can  encounter  the  problem  of  priority  inversion.  What 
we  need  is  a  concurrency  control  algorithm  that  allows 
transactions  to  meet  the  timing  constraints  as  much  as 
possible  without  reducing  the  concurrency  level  of  the 
system  in  the  absence  of  any  a  priori  information.  The 
integrated  real-time  locking  protocol  presented  in  this 
paper  meets  these  goals.  It  has  the  flavor  of  both  locking 
and  optimistic  methods. 

Transactions  write  into  the  database  only  after  they 
are  committed.  By  using  a  priority-dependent  locking 
protocol,  the  serialization  order  of  active  transactions  is 
adjusted  dynamically,  making  it  possible  for  transactions 
with  higher  priorities  to  be  executed  first  so  that  higher 
priority  transactions  are  never  blocked  by  uncommitted 
lower  priority  transactions,  while  lower  priority  transac¬ 
tions  may  not  have  to  be  aborted  even  in  face  of 
conflicting  operations.  The  adjustment  of  the  serializa¬ 
tion  order  can  be  viewed  as  a  mechanism  to  support 
time-critical  scheduling. 

Example  I:  Assume  T ]  and  Tj  are  two  transactions  with 
Ti  having  a  higher  priority.  Ti  writes  a  data  object  x 
before  T |  reads  iL  Using  21^.  even  in  the  absence  of  any 
other  conflicting  operations  between  these  two  transac¬ 
tions,  Ti  has  to  either  abort  r2  or  be  blocked  until  T2 
releases  the  write  lock. 

In  Example  1,7]  can  never  precede  T2  in  the  seri¬ 
alization  order,  because  the  serialization  order  T2-*Ti  is 
already  determined  by  the  past  execution  history.  In  our 
protocol,  when  such  conflict  occurs,  the  serialization 
order  of  the  two  transactions  will  be  adjusted  in  favor  of 
Ti,  i.e.  T\-^T2,  and  neither  is  Tx  blocked  nor  is  T2 
aborted.  Together  with  priority-bas^  blocking,  the  real¬ 
time  locking  protocol  is  free  from  deadlocks. 

All  transactions  that  can  be  scheduled  are  placed  in 
a  ready  queue,  R_Q.  Only  transactions  in  RJ2  are 
scheduled  for  execution.  When  a  transaction  is  blocked, 
it  is  removed  from  R_Q.  When  a  transaction  is 
unblocked,  it  is  inserted  into  RjQ  again,  but  may  still  be 
waiting  to  be  assigned  the  CPU.  A  transaction  is  said  to 
be  suspended  when  it  is  not  executing,  but  still  in  R_Q. 
When  a  transaction  is  doing  I/O  qjerations,  it  is  blocked. 
Once  it  completes,  it  is  usually  unblocked. 

The  execution  of  each  uansaction  is  divided  into 
three  phases:  the  read  phase,  the  wait  phase  and  the  write 
phase.  During  the  read  phase,  a  transaction  reads  from 
the  database  and  writes  to  its  local  workspace.  After  it 
completes,  it  waits  for  its  chance  to  commit  in  the  wait 
phase.  If  it  is  committed,  it  switches  into  the  write  phase 


during  which  all  its  updates  are  made  pennanent  in  the 
database.  A  transaction  in  any  of  the  three  phases  is 
called  Mtive.  We  take  an  approach  of  integrated 
schedulers  in  that  it  uses  2PL  for  r^-write  conflicts  and 
the  Thomas’  Write  Rule  (TWR)  for  write-write  conflicts. 
The  TWR  ignores  a  write  request  that  has  arrived  late, 
rather  than  rejects  it  Pern87]. 

.  In  our  protocol,  there  are  various  data  structures 
that  need  to  be  read  and  updated  in  a  consistent  manner. 
Therefore  we  assume  the  existence  of  critical  sections  to 
guarantee  that  only  one  process  at  a  time  updates  these 
data  structures.  We  assume  critical  sections  of  various 
classes  to  group  the  various  data  structures  and  allow 
maximum  concurrency.  We  also  assume  that  each 
assignment  statement  of  global  data  is  executed  atomi¬ 
cally. 

2.2.  Read  Phase 

The  read  phase  is  the  normal  execution  of  a  tran¬ 
saction  except  that  write  operations  are  performed  on 
private  data  copies  in  the  local  workspace  of  the  transac¬ 
tion  instead  of  on  data  objects  in  the  database.  We  call 
such  write  operations  prewrites,  denoted  by  pwjix].  A 
write  request  from  a  transaction  is  performed  by  a 
prewrite  operation.  Since  each  transaction  has  its  own 
local  workspace,  a  prewrite  operation  does  not  write  into 
dw  database,  and  if  a  transaction  previously  wrote  a  data 
object,  subsequent  read  operations  to  the  same  data  object 
retrieve  the  v^ue  from  the  local  workspace. 

The  lead-prewrite  or  prewrite-read  conflicts 
between  active  transactions  are  synchronized  during  this 
phase  by  a  priority-based  locking  protocol.  Before  a 
transaction  can  perform  a  read  (resp.  prewrite)  operation 
on  a  data  object,  it  must  obtain  the  read  (resp.  write)  lock 
on  that  data  objea  first  A  read  (resp.  write)  lock  on  x  by 
transaction  T  is  denoted  by  rlock(T,x)  (resp.  wlock{T,x)). 
If  a  transaction  reads  a  data  object  tluit  h^  been  wriuen 
by  itself,  it  gets  the  private  copy  in  its  own  workspace 
a^  no  r^  lock  is  needed.  In  the  rest  of  the  paper,  when 
we  refer  to  read  operations,  we  exclude  such  re^l  opera- 
timis  because  they  do  not  induce  any  dependencies 
anong  transactions. 

The  locking  protocol  is  based  on  the  principle  that 
higher  priority  uansactions  should  complete  before  lower 
priority  transactions.  That  is,  if  two  transactions  conflict, 
the  higher  priority  transaction  should  precede  the  lower 
priority  transaction  in  the  serialization  order.  Using  an 
qqtropriate  CPU  scheduling  policy  for  RTDBS,  a  high 
priority  transaction  can  be  scheduled  to  commit  before  a 
low  priority  transaction  in  most  cases  [Lin901.  If  a  low 
priority  transaction  does  complete  before  a  high  priority 
transaction,  it  is  required  to  wait  until  it  is  sure  that  its 
commitment  will  not  lead  to  the  higher  priority  uansac- 
tion  being  aborted. 


Suppose  active  uansaction  T j  has  higher  priority 
than  active  transaction  T2.  We  have  four  possible 
conflicts  and  the  transaction  dependencies  they  require  in 
the  serialization  order  as  follows: 

(1)  rr^{x\ ,  pwj^ix] 

The  resulting  serialization  order  is  which 

satisfies  the  priority  order,  and  hence  it  is  not  necessary  to 
adjust  the  serialization  order. 

(2)  pwjjxl.rrjxl 

Two  different  serialization  orders  can  be  induced  with 
this  conflict;  Ti-^Ti  with  immediate  reading,  and 
Ti-*T2  with  delayed  reading.  Certainly,  the  latter 
should  be  chosen  for  iniority  scheduling.  The  delayed 
reading  means  that  rr,[x]  is  blocked  by  the  write  lock  of 
Ti  onx. 

(3)  tt-Jx]  .pwrjx] 

The  resulting  serialization  ’;-der  is  T2-*Ti,  which 
violates  the  priority  order.  If  72  is  in  the  read  phase,  it  is 
aborted  because  otherwise  T 2  must  commit  before  T 1  and 
thus  block  Ti.  If  72  is  in  its  wait  phase,  avoid  aborting 
72  until  7)  commits,  in  the  hope  that  T2  gets  a  chance  to 
commit  before  7 1  commits.  If  7 1  commits.  72  is  aborted. 
But  if  7|  is  aborted  by  some  other  conflicting  transaction, 
then  72  is  committed.  With  this  policy,  we  can  avoid 
unnecessary  and  useless  aborts,  while  satisfying  priority 
scheduling. 

(4)  pwrjxl.rr.fx] 

Two  different  serialization  orders  can  be  induced  with 
this  conflict;  7i-»72  with  immediate  reading,  and 
72->7|  with  delayed  reading.  If  T2  is  in  its  write 
phase,  delaying  7i  is  the  only  choice.  This  blocking  is 
not  a  serious  problem  for  7 1  because  Ti  is  expected  to 
finish  writing  x  soon.  7|  can  read  x  as  soon  as  T2 
finishes  writing  x  in  the  database,  not  necessarily  after  T2 
completes  the  whole  write  phase.  If  T2  is  in  its  read  or 
wait  phase,  choose  immediate  reading. 

As  transactions  are  being  executed  and  conflicting 
operations  occur,  all  the  information  about  the  induced 
dependencies  in  the  serialization  order  needs  to  be 
retained.  To  do  this,  we  retain  two  sets  for  each  transac¬ 
tion.  beforejrset  and  after jrset,  and  a  count.  b^ore_cnt. 
The  set  brforejrset  (resp.  (rfter jrset)  contains  alF  the 
active  lower  priority  transactions  that  must  precede  (resp. 
follow)  this  transaction  in  the  serialization  order. 
beforejnt  is  the  number  of  the  higher  priority  transac¬ 
tions  that  precede  this  uansaction  in  the  serialization 
order.  When  a  conflict  occurs  between  two  transactions, 
their  dependency  is  set  and  their  values  of  b^ore jrset, 
trfier  jrset,  and  brforej;nt  will  be  changed  accordingly. 

By  summarizing  what  we  discussed  above,  we 
define  the  real-time  locking  protocol  as  follows: 


LPI.  ThuisacUon  T requests  a  read  lock  on  data  object 

X. 

for  all  transactions  t  with  wlock(t,x)  do 
if  (priority  (t)>  priority  (T) 
or  t  is  in  write  phase) 

/*  Case  2. 4  */ 
tben  deny  the  lock  and  exit,* 
endif 
enddo 

for  all  transactions  t  with  wlock(t,x)  do 
I*  Case  4*1 

It  t  is  in  beforejrsetr  then  abort  t: 
else  If  (t  is  not  in  (tfterjrsetr) 

then 

include  t  in  after  jrsetr: 
brforejcnt, before_cnt,  +  1: 

endif 

endif 

enddo 

grant  the  lock: 

LP2.  Transaction  T  requests  a  write  lock  on  data 
object  X. 

for  all  transactions  t  with  rlock(tX)  do 
If  priority  (t)  >  priority  (T) 
then  !*  Case  I  *1 

itCTisnot  in  cfterjrset,) 

then 

include  t  in  ttfterjrset,; 
before j:ntT  before jcntf  +  1: 

endif 

else 

If  t  is  in  wait  phase  /*  Case  3  *1 

then 

If  (t  is  in  cfterjrseij) 
then  abort  i; 
else 

include  t  in  beforejrsetr: 

endif 

else  if  t  is  in  read  phase 
then  abort  t: 
endif 
endif 
endif 
enddo 

grant  the  lock; 

LPI  and  LP2  are  actually  two  procedures  of  the 
lock  manager  that  are  executed  when  a  lock  is  requested. 
When  a  lock  is  denied  due  to  a  conflicting  lock,  the 
request  is  suspended  until  that  conflicting  lock  is  released. 
Thim  the  locking  protocol  is  invoked  once  again  from  the 
very  b^inning  to  decided  whether  the  lock  can  be 
granted  now.  With  our  locking  protocol,  a  data  object 
may  be  both  read  locked  and  write  locked  by  several 


transactions  simultaneously. 

2  J.  Wait  Phase 

The  wait  phase  allows  a  transaction  to  wait  until  it 
can  commit.  A  transaction  can  commit  only  if  all  tran¬ 
sactions  with  higher  priorities  that  must  precede  it  in  the 
serialization  order  are  either  committed  or  abexted.  Since 
brforej:nt  is  the  number  of  such  transactions,  the  tran¬ 
saction  can  commit  only  if  its  before jcnt  becomes  zero. 
A  transaction  in  the  wait  phase  may  be  aborted  due  to  two 
reasons;  if  a  higher  priority  transaction  requests  a 
conflicting  lock,  or  if  a  higher  priority  transaction  that 
must  follow  this  transaction  in  the  serialization  order 
commits  first  Once  a  transaction  in  the  wait  phase  gets 
its  chance  to  commit,  i.e.  its  before j:nt  goes  to  zero,  it 
switches  to  the  write  phase  and  rele^  all  its  read  locks. 
The  transaction  is  assigned  a  final-timestamp,  which  is 
the  absolute  serialization  order. 

2.4.  Write  Phase 

Once  a  transaction  is  in  the  write  phase,  it  is  con¬ 
sidered  to  be  committed.  All  committed  transactions  are 
serialized  by  the  final-timestamp  order.  Updates  are 
made  permanent  to  the  database  while  applying  Thomas’ 
Write  Rule  (TWR)  for  write-write  conflicts  [BerST], 
After  each  operation  the  corresponding  write  lock  is 
released. 

3.  Properties  and  Correctness 

Having  described  the  basic  concepts  and  the  proto¬ 
col,  we  now  present  some  properties  and  prove  the 
correctness  of  the  protocol.  First,  we  give  the  simple 
definitions  of  history  and  serialization  graph  (SG).  For 
the  formal  definitions,  readers  are  referred  to  [Bem87]. 
A  history  is  a  partial  order  of  operations  that  represents 
the  execution  of  a  set  of  transactmns.  Any  two 
conflicting  operations  must  be  comparable.  Let  //  be  a 
history.  The  serialization  graph  for  //.  denoted  by  SG(H), 
is  a  (firected  graph  whose  nodes  are  committed  transac¬ 
tions  in  H  and  whose  edges  are  all  7^-  -»  Tj  (i^j)  such  that 
one  of  Ti's  operations  precedes  and  conflicts  with  one  of 
Tj’s  operations  in  H.  To  prove  a  history  H  serializable, 
we  only  have  to  prove  that  SG(/f)  is  acyclic  [Bem87]. 

Theorem  1:  Every  history  H  produced  by  the  protocol  is 
serializable. 

Proof:  Let  T j  and  72  be  two  committed  transactions  in  a 
history  H  prt^uced  by  the  algorithm.  We  argue  that  if 
there  is  an  edge  Tj  -»  72  SG(//),  then  (s(7|)  <  ts(X2). 

Since  7]  -*T2,  The  two  must  have  conflicting  opera¬ 
tions.  There  are  three  cases. 

Case  1:  W|[x]  -*  h'2[x] 

Suppose  ts(72)  <  u(7|).  Therefore  T2  enters  into  the 
write  phase  before  7|.  If  wilx]  is  sent  to  the  data 


manager  first.  T^’s  write  lock  on  x  must  be  released 
before  wi[x]  is  sent  to  the  data  manager.  If  W2[xl  is  sent 
to  the  data  manager  first,  it  will  either  be  processed 
before  wi[x]  is  sent  to  the  data  manager,  or  be  discarded 
when  the  data  manager  receives  w,[x],  because  wjlx] 
has  a  smaller  timestamp.  Therefore  wjx]  is  never  pro¬ 
cessed  before  wjEx].  Such  conflict  is  impossible.  A  con¬ 
tradiction. 

CBSe2:rt[x]->W2[x] 

If  72  holds  the  write  lock  on  x  when  7 1  requests  the  read 
kxk,  we  must  have  priority  (7i )  >  priority  (7 2)  and  72  is 
not  in  the  write  pha%,  because  otherwise  7i  would  have 
been  blocked  by  LPl.  By  LPl,  T2  is  in  afterjrsetr,.  T2 
will  not  switch  into  the  write  phase  before  7|  does, 
because  btforejtntr^  cannot  be  zero  with  7 1  still  in  the 
read  or  wait  phase.  Therefore  ts(Ti)  <  ts(T2).  If  7| 
hdds  read  lock  on  x  when  T2  requests  the  write  lock,  by 
LP2,  we  have  either  72  is  in  c^terjrsetr^  or  7i  is  in 
btforejrsetr^,  depending  on  the  priorities  of  the  two 
transactions.  In  either  case,  7i  must  commit  before  T2. 
Hence  we  also  have  ts(Ty)  <  ts(T2). 

Gise  3:  M>  I  [x  ] -» /‘2[x  ] 

Since  7i  is  already  in  the  write  phase  before  T2  reads  x, 
we  must  have  tr  (7i)  <  ts<X2). 

Suppose  there  is  a  cycle  Ty-^Tx-*  •••  -*  7,  -»  7|  in 
SG(f/)<  By  the  above  argument,  we  have 
tf(7|)  <«(72)  <  ••  •  <  W(7J  <m(7i).  This  is 

impossible.  Therefore  no  cycle  can  exist  in  SG(//)  and 
the  algorithm  only  produces  serializable  histories.  □ 

Theorem  2:  There  is  no  mutual  deadlock  under  the 
real-time  locking  protocol. 

Proof:  In  d'.c  algorithm,  a  high  (viority  transaction  can 
be  blocked  by  a  low  priority  uan:  action  only  if  the  low 
priority  transaction  is  in  the  write  ph?se.  Suppose  there  is 
a  cycle  in  the  wait-for  graph  (WFG), 
7|  -»  72  7,  -»  7,.  For  any  edge  Ty  -»  7y  in 

the  cycle,  if  priority  (Ti)  >  priority  (Xj),  7y  must  be  in  the 
write  phase,  thus  it  cannot  be  blocl^  by  any  other  tran¬ 
sactions  and  cannot  appear  in  the  cycle.  Therefore  we 
must  have  priority  (Jy)  <  priority  {Tj)  and  thus 
priority  (7i )  <  priority  (72)  <  •  •  •  <  priority  (7,)  < 

priority (Jx).  This  is  impossible.  Hence  a  deadlock  can¬ 
not  exist.  □ 

We  now  discuss  some  properties  of  the  protocol. 
First,  the  protocol  provides  a  desirable  property  beyond 
the  serializability,  namely  the  svictness.  From  a  practical 
viewpoint,  serializability  of  transactions  is  not  always 
enough.  To  ensure  the  correctness  in  the  presence  of 
failures,  the  concurrency  control  protocol  must  produce 
execution  histories  that  are  not  only  serializable  but  also 
recoverable.  A  history  H  is  called  recoverable  if,  when¬ 
ever  Ti  reads  from  7y  in  H  and  7,  commits,  Tj  must 


commit  before  7^.  A  history  H  is  called  strict  if,  when¬ 
ever  wjxl  precedes  oy[xl  where  oy(xl  is  either  wy(x]  or 
ry[x],  Ti  must  either  commit  or  abort  before  Oj[x].  It  is 
known  that  strictness  is  a  stronger  condition  than  recover¬ 
ability,  i.e.,  a  set  of  strict  histories  is  a  proper  subset  of 
recoverable  histories,  and  it  is  more  desirable  for  practi¬ 
cal  reasons  [Bem87]. 

The  strictness  of  the  histories  produced  by  the  algo¬ 
rithm  follows  obviously  from  the  fact  that  a  transaction 
applies  the  results  of  its  write  operations  from  its  local 
workspace  into  the  database  only  after  it  commits.  This 
pre^rty  makes  the  transaction  recovery  procedure 
simpler  than  other  concurrency  control  piotocrrfs  that  do 
not  support  strictness. 

Another  property  to  be  discussed  is  the  degree  of 
concurrency  provided  by  the  protocol.  The  compatibility 
depends  on  the  priorities  of  the  transactions  holding  and 
requesting  the  lock  and  the  phase  of  the  lock  holder  as 
well  as  the  lock  types.  Unlike  2PL,  locks  are  not 
classified  simply  as  shared  locks  and  exclusive  locks. 
Even  with  the  same  lock  types,  different  actions  may  be 
taken,  depending  on  the  priorities  of  the  lock  holder  and 
the  lock  requester.  With  the  real-time  locking  protocol,  a 
data  object  may  be  both  read  locked  and  write  locked  by 
several  transactions  simultaneously,  and  hence  it  is  less 
restrictive  than  2PL,  and  can  provide  higher  degree  of 
concurrency  by  incurring  less  blocking  and  fewn  aborts. 
In  the  real-time  locking  protocol,  a  high  priority  transac¬ 
tion  is  never  blocked  or  aborted  due  to  conflict  with  an 
uncommitted  lower  priority  transaction.  The  probability 
of  aborting  a  lower  priority  transaction  should  be  less 
than  that  in  2PL  under  the  same  conditions.  An  analytical 
model  may  be  used  to  estimate  the  exact  probability,  but 
that  is  beyond  the  scope  of  this  paper. 

4.  Performance  Evaluation 

Since  the  integrated  real-time  locking  protocol 
assumes  that  the  data  requirement  or  execution  time  of 
each  transaction  is  not  Imown,  we  should  compare  the 
protocol  with  other  protocols  with  the  same  assumption. 
In  this  section,  a  comparative  evaluation  of  the  p^or- 
mance  of  the  real-time  locking  protocol  is  presented.  The 
results  obtained  through  a  simulation  study  indicate  that 
the  real-time  locking  protocol  offers  perfoimance 
improvement  over  2PL. 

The  performance  of  the  real-time  locking  protocol 
was  sujdied  using  a  prototyping  environment  ftH*  ^tabase 
systems  [Son901.  In  our  simulation,  transactions  are  gen¬ 
erated  and  put  into  the  start-up  queue.  When  a  transac¬ 
tion  is  stait^,  it  leaves  the  start-up  queue  and  enters  the 
ready  queue.  Transactions  in  the  ready  queue  are  ordered 
from  the  highest  to  the  lowest  priority.  The  transaction 
with  the  highest  priority  is  always  selected  to  run.  The 
current  running  transaction  sends  requests  to  the  con¬ 
currency  controller.  The  transaction  may  be  blocked  and 


placed  in  the  block  queue.  It  ntay  also  be  aborted  and  res¬ 
tarted.  In  such  a  case,  it  is  first  delayed  for  a  certain 
amount  of  time  and  then  put  in  the  ready  queue  again. 
When  a  transaction  in  the  block  queue  is  unblocked,  it 
leaves  the  block  queue  and  is  piac^  in  the  ready  queue. 
Whenever  a  transaction  enters  the  ready  queue  and  its 
priority  is  higher  than  the  current  running  transaction,  it 
{Meempts  the  current  running  transaction. 

When  a  transaction  enters  the  start-up  queue,  it  has 
the  arrival  time,  the  deadline,  the  priority,  the  read  set  and 
the  write  set  associated  with  it  The  transaction  inter- 
arrival  time  is  a  random  variable  with  exponential  distri¬ 
bution.  The  deadline  and  the  priority  are  computed  by 
the  following  formulas: 

Deadliner  =  Arrivalj  +  Slack  *  Timer 
Priority  j  =  M Deadliner 

where 

Deadliner=  Deadline  of  transaction  T 
Arrivalr  =  Arrival  time  of  transaction  T 
Timer  -  Service  time  of  transaction  T 
Slack  s  Slack  factor 

The  slack  /actor  is  a  random  variable  between  3  and  S 
with  uniform  distribution.  The  service  time  is  the  total 
time  that  the  transaction  needs  for  its  data  processing. 
This  includes  the  CPU  time  and  the  I/O  time.  The  dead¬ 
line  formula  is  designed  to  ensure  that  all  transactions, 
independent  of  their  service  requirement,  have  the  same 
chaiice  of  making  their  deadline.  The  transaction  priority 
assignment  policy  is  Earliest  Deadline.  Transactions 
with  earlier  deadlines  have  higher  priority  than  transac¬ 
tions  with  later  deadlines.  A  greater  priority  value  means 
higher  priority.  The  data  objects  in  the  read  set  and  the 
write  set  are  uniformly  distributed  across  the  entire  data¬ 
base.  A  transaction  consists  of  a  sequence  of  read  and 
write  operations.  A  read  operation  involves  a  con¬ 
currency  control  request  to  get  access  permission,  fol¬ 
lowed  by  a  disk  I/O  to  read  the  data  object,  followed  by  a 
period  of  CPU  usage  for  processing  the  data  object. 
Write  operations  are  handled  similarly  except  for  their 
disk  I/O.  Since  it  is  assumed  that  transactions  maintain 
deferred  update  lists  in  buffers  in  main  memory,  disk 
activity  of  write  access  is  deferred  until  the  uansaction 
has  committed  and  switched  into  the  write  phase.  A  tran¬ 
saction  can  be  discarded  at  any  time  if  its  deadline  is 
missed.  Therefore  our  model  employs  a  hard  deadline 
policy. 

To  ensure  significance  of  the  comparison,  the  clas¬ 
sical  two-phase  locking  needs  to  be  augmented  with  a 
priority  scheme  to  ensure  that  higher  priority  uansactions 
are  not  delayed  by  lower  priority  transactions.  We  used 
the  High  Priority  schme  [Abb88],  in  which  all  data 
conflicts  are  resolved  in  favor  of  the  uansaction  with 
higher  priority.'  When  a  transaction  requests  a  lock  on  a 
data  object  held  by  other  transactions  in  an  incompatible 


mode,  if  the  requester’s  priority  is  higher  than  that  of  all 
the  lock  holders,  the  holders  are  restarted  and  the  reques¬ 
ter  is  granted  the  lock;  if  the  requester’s  priority  is  lower, 
it  waits  for  the  lock  holders  to  release  the  lock.  This 
scheme  has  the  advantage  of  deadlock  prevention. 

For  each  experiment,  we  collected  performance 
statistics  and  averaged  over  10  runs.  We  have  used  the 
transaction  size  (the  number  of  data  objects  a  transaction 
needs  to  access)  as  one  of  the  key  variables  in  the  experi¬ 
ments.  It  varies  from  a  small  fraction  up  to  a  relatively 
large  portion  (1S%)  of  the  database  so  that  conflicts 
would  occur  frequently.  The  high  conflict  rate  allows 
concurrency  control  protocols  to  play  a  significant  role  in 
the  system  performance.  We  chrose  the  average  arrival 
rate  so  that  protocols  are  tested  in  a  heavily  loaded  rather 
than  lightly  loaded  system.  It  is  because  for  designing 
real-time  systems,  one  must  consider  high  load  situations. 
Even  though  they  may  not  arise  frequently,  one  would 
like  to  have  a  system  that  misses  as  few  deadlines  as  pos¬ 
sible  when  the  system  is  under  stress  [Abb88]. 

The  primary  performance  metric  used  in  analyzing 
the  experimental  results  is  the  miss  percentage  of  the  sys¬ 
tem,  defined  as  the  percentage  of  transactions  that  do  not 
complete  before  their  deadline.  Miss  percentage  values 
in  the  range  of  0  to  20  percent  can  be  taken  to  represent 
system  performance  under  "normal"  loadings,  while  miss 
percentage  values  in  the  range  of  20  to  100  percent 
represent  system  performance  under  "heavy”  loading 
[Har90].  A  secondary  performance  metric,  restarts,  is 
the  number  of  restarts  for  a  fixed  number  of  transactions. 
We  chose  this  metric  because  it  provides  insight  into  the 
system  behavior.  The  advantage  of  the  real-time  locking 
protocol  is  that  while  high  priority  transactions  are  not 
blocked  by  low  primity  transactions,  low  priority  transac¬ 
tions  need  not  be  resuoled  most  of  the  time.  We  can  ver¬ 
ify  this  by  using  restarts  as  a  performance  metric. 

Table  1  summarizes  the  key  parameters  of  the 
simulation  model  and  their  default  values.  Transaction 
size  (data  access  per  transaction)  is  the  total  number  of 
data  access  operations  of  each  transaction.  Among  all  the 
data  access  operations  of  a  transaction,  the  percentage  of 
write  operations  is  specified  by  the  write  percentage. 

By  changing  the  mean  inter-arrival  time,  we  can 
study  the  system  performance  under  normal  load  and 
heavy  load.  Fig.  4  ^ows  that  the  real-time  kx;king  proto¬ 
col  performs  better  than  2PL  under  both  normal  lo:^  and 
heavy  load.  If  we  consider  a  miss  percentage  under  20% 
as  "normal",  the  real-time  locking  protocol  can  keep  the 
system  operating  satisfactorily  when  the  mean  inter¬ 
arrival  time  is  as  small  as  40ms,  while  with  2PL  the  sys¬ 
tem  can  maintain  a  normal  load  only  when  the  mean 
inter-arrival  time  is  greater  that  70  ms.  Another  interest¬ 
ing  result  is  that  under  normal  load,  the  restart  number  for 
each  protocol  is  less  than  10.  A  restart  number  greater 
than  10  indicates  a  degraded  system  performance  for 
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Abstract 

In  this  paper,  we  consider  using  hardware  and  software 
redundancy  to  guarantee  task  deadlines  in  a  hard  real-time 
multiprocessor  system  even  in  the  presence  of  processor 
failures.  A  set  of  scheduling  requirements  for  the  real-time 
fault-tolerant  multiprocessor  scheduling  problem  is  first 
identified  and  a  heuristic  algorithm  is  then  proposed  to 
solve  the  problem.  Experimental  results  show  that  the  algo¬ 
rithm  finds  optimal  solutions  in  most  of  the  cases. 


L  Introdocthm 

Hard  real-time  systems  are  defined  as  those  systems  in 
which  the  correctness  of  the  system  depends  not  only  on  the 
logical  results  of  computation,  but  also  on  the  time  at  which 
the  results  are  produt^  Missing  a  hard  deadline  in  such  a 
system  may  result  in  catastrophic  consequences,  such  as 
immediate  danger  to  human  life,  severe  damage  to  equip¬ 
ments,  or  waste  of  expensive  resources.  Since  hard  real¬ 
time  systems  are  being  increasingly  used  in  many  mission- 
critical  and  life-critical  applications,  the  fault  tolerance  of 
these  systems  becomes  extremely  important. 

A  real-time  system  which  possesses  fault-tolerant 
cap^ility  is  usually  termed  a  real-time  fault-tolerant  sys¬ 
tem.  The  correctness  of  a  real-time  fault-tolerant  system 
requires  that  timing  constiaints  of  computations  in  the  sys¬ 
tem  be  met  even  in  the  presence  of  hardware  or  software 
faults.  Hardware  or  software  faults  in  a  real-time  system 
can  lead  to  timing  faults,  such  as  missing  a  hard  deadline, 
which  should  be  prohibited  in  a  hard  real-time  system.  To 
tolerate  timing  faults,  several  studies  have  been  carried  out 
in  achieving  fault  tolerance  in  real-time  systems  through  the 
scheduling  of  redundant  resources,  such  as  replicated  tasks 
and  redundant  processor  power  [Anders83]  [Balaji89] 
[Bannis83]  [Bertos91]  [LiestmSd]  [Krishn86].  Most  of 
these  works,  however,  focus  either  on  uniprocessor  sched¬ 


uling.  or  on  achieving  certain  scheduling  criteria,  such  as 
minimizing  schedule  length  or  local  cost  function,  or  bal- 
aitcing  workload  distribution.  The  problem  of  tolerating 
processor  failures  in  a  hard  real-time  multiprocessor  system 
has  not  been  sufficiently  addressed  in  the  literature. 

The  investigation  of  the  problem  is  further  necessitated 
by  the  fact  that  many  hard  real-time  systems  are  being  sup¬ 
ported  by  multiprocessor  systems.  This  is  mainly  due  to  the 
following  two  reasons.  Fust,  a  multiprocessor  system  is 
generally  more  reliable  than  a  uniprocessor  system,  because 
the  failure  of  one  processor  in  a  multiprocessor  system  does 
not  necessarily  cause  the  whole  system  to  fail  if  some  fault 
tolerance  techniques  are  provided.  Second,  a  multiproces¬ 
sor  system  can  offer  more  computational  power  for  hard 
real-time  systems  than  a  uniprocessor  system.  However, 
with  these  advantages  also  comes  the  disadvantage  of  more 
likelihood  of  processor  failures  as  more  {wocessors  are 
used.  A  multiprocessor  system  can  be  less  reliable  than  a 
uniprocessor  system  if  one  processor  failure  can  cause  the 
whole  system  to  fail.  This  can  happen  if  no  fault-tolerant 
capability  is  provided.  Thus,  a  processor  failure  in  a  hard 
re^-time  multiprocessor  system  is  a  very  serious  problem, 
which  ought  to  be  tackled.  In  this  paper,  we  address  this 
problem  by  finally  defining  it  as  a  re^-time  fault-tolerant 
scheduling  problem  and  then  propose  a  heuristic  algorithm 
to  deal  with  processor  failures. 

The  rest  of  the  paper  is  organized  as  follows:  Sec¬ 
tion  II  defines  the  real-time  fault-tolerant  multiprocessor 
scheduling  problem.  A  heuristic  scheduling  algorithm  is 
presented  in  Section  III.  The  analysis  and  performance 
evaluation  of  the  algorithm  are  described  in  Section  IV. 
Section  V  concludes  the  paper  and  suggests  future  work. 

II.  Problem  Statements 

We  assume  that  processors  fail  in  the  fail-stop  manner 
and  the  failure  of  a  processor  can  be  detected  by  other  pro¬ 
cessors.  All  periodic  tasks  arrive  at  the  system  in  one  cycle 
r.  i.e..  having  the  same  period  and  are  ready  to  execute  any 


time  within  each  cycle.  We  further  assume  that  all  periodic 
tasks  have  hard  deadlines  and  their  deadlines  have  to  be  met 
even  in  the  presence  of  processor  failures.  We  define  a 
task’s  meeting  its  deadline  as  either  its  primary  copy  or  its 
backup  copy  finishes  before  or  at  the  deadline.  Because  the 
failure  of  processors  is  unpredictable  and  there  is  no  opti¬ 
mal  dynamic  scheduling  algorithm  for  multiprocessor 
scheduling  [Dertou89].  we  focus  on  static  scheduling  algo¬ 
rithms  to  ensure  that  the  deadlines  of  tasks  are  met  even  if 
some  of  the  processors  might  fail.  The  scheduling  problem 
is  defined  as  follows: 

The  Scheduling  Problem:  A  set  of  n  periodic  tasks 
S  =  {TpTj.  is  to  be  scheduled  on  a  number  of  pro¬ 
cessors.  For  each  task  i,  there  are  a  inimary  copy  P-  and  a 
backup  copy  B.  associated  with  it  The  computation  time  of 
a  primary  copy  P^  is  denoted  as  C.,  which  is  the  same  as  the 
computation  time  of  its  backup  copy  B,-.  The  tasks  are  inde¬ 
pendent  of  each  other.  The  scheduling  requirements  are 
given  as  follows: 

(1)  Each  task  is  executed  by  one  processor  at  a  time 
and  each  processor  executes  one  task  at  a  time. 

(2)  All  periodic  tasks  should  meet  their  deadlines.  Ape¬ 

riodic  tasks  have  soft  deadlines. 

(3)  Maximize  the  number  of  processor  failures  to  be 
tolerated. 

(4)  For  each  task  i,  the  primary  task  P.  or  the  backup 

is  assigned  to  only  one  processor  for  the  dura¬ 
tion  of  C,.  and  once  it  starts,  it  runs  to  its  comple¬ 
tion  unless  a  failure  occurs. 

(5)  The  number  of  processors  used  should  be  mini¬ 
mized. 

The  deadlines  of  aperiodic  tasks  are  assumed  to  be  soft. 
However,  as  we  will  show  later,  the  execution  of  aperiodic 
tasks  are  taken  into  account.  Thus,  in  a  normal  execution 
situation,  aperiodic  tasks  are  able  to  meet  their  deadlines. 
We  further  assume  that  all  the  processors  are  identical. 
Requirement  (1)  specifies  that  there  is  no  parallelism  within 
a  task  and  within  a  processor.  Requirement  (2)  dictates  that 
the  deadlines  of  periodic  tasks  should  be  met,  maybe  at  the 
expense  of  more  processors.  Requirement  (3)  is  a  very 
strong  requirement  The  primary  and  backup  tasks  should 
be  scheduled  on  different  processors  such  that  any  one  or 
more  processor  failure  will  not  result  in  the  missing  of  the 
hard  deadlines  of  the  periodic  tasks.  Furthermore,  the  pri¬ 
mary  copy  and  the  backup  copy  of  a  task  should  not  overlap 
each  other,  as  we  shall  see  in  Lemma  2.  Requirement  (4) 
implies  that  tasks  are  not  preemptive.  A  processor  is 
informed  the  failure  of  other  processors  only  at  the  end  of 
the  execution  of  a  task.  Also,  care  has  to  be  taken  to  ensure 
that  exactly  one  of  the  two  copies  of  a  task  is  executed  dur¬ 
ing  a  cycle  to  minimize  the  wasted  work.  Requirement  (S) 


states  that  the  number  of  processors  to  be  used  to  execute 
the  tasks  should  be  the  smallest  possible. 

Since  no  efficient  scheduling  algorithm  exists  for  the 
optimal  solution  of  the  fault-tolerant  real-time  multiproces¬ 
sor  scheduling  problem  as  defined  above,  we  resolve  to  a 
heuristic  approach.  A  heuristic  algorithm  based  on  a  bin 
packing  algorithm  is  used  to  obtain  approximate  solutions. 
Before  presenting  the  heuristic,  we  state  the  following 
Lemmas  as  the  basic  results  upon  which  the  scheduling 
heuristic  is  developed. 

Lemma  1:  In  order  to  tolerate  one  or  more  processor 
failures  and  guarantee  that  the  deadline  of  all  the  periodic 
tasks  are  met  using  the  primary-backup  copy  approach,  the 
longest  computation  time  of  the  tasks  must  satisfy  the  fol¬ 
lowing  condition,  where  T  is  the  period  of  tasks: 

Proof:  Suppose  that  the  deadline  of  the  task  Tj  can  still 
be  met  even  if  Cj  >  7/2.  Suppose  the  processor  which  exe¬ 
cutes  Tj  fails  at  the  time  of  7/2and  the  backup  task  Bj  is 
immediately  started,  then  the  finishing  time  of  Tj  is 
BFj  =  T/2  +  Cj.  As  Cj>T/2,  we  have  BFj>T,  i.e.,  the 
deadline  of  the  task  is  missed.  This  is  a  contradiction.  A 

Lemma  2:  One  arbitrary  processor  failure  is  tolerated 
and  the  deadlines  of  tasks  are  met  with  the  minimum  num¬ 
ber  of  processors  possible,  if  and  only  if  the  primary  copy 
Pf  and  the  Backup  copy  B^  of  task  i  is  scheduled  on  two 
different  processors  and  there  is  no  overlapping  between 
them. 

Proof:  In  [LawlerSl],  it  is  shown  that  a  set  of  periodic 
tasks  is  schedulable  on  a  multiprocessor  if  and  only  if  there 
exists  a  valid  schedule  which  is  cyclic  with  a  period  7;  i.e., 
each  processor  does  exactly  the  same  thing  at  time  t  as  it 
does  at  time  t+T.  Therefore  it  suffices  to  consider  the  exe¬ 
cution  of  tasks  within  a  period  7  only.  We  first  prove  the 
necessary  condition.  Suppose  one  arbitrary  processor  fail¬ 
ure  is  tolerated.  It  is  evident  that  the  primary  copy  of  a  task 
md  its  backup  copy  should  be  scheduled  on  two  different 
processors.  To  prove  that  there  is  no  overlapping  between 
the  primary  copy  of  a  task  and  its  backup  copy,  we  define 
BB-  as  the  beginning  time  of  the  backup  copy  B-  and  FP^  as 
the  finishing  time  of  the  primary  copy  P^.  If  there  is  an 
overlapping  between  the  primary  copy  of  task  i  and  its 
backup  copy,  then  FP--BBi>0.  Suppose  the  processor  * 
on  which  the  backup  copy  B-  of  task  <  is  assigned  has  no 
unused  time  within  a  period  and  the  processor  j  on  which 
the  primary  copy  is  executed  fails  at  time  t  >  BB-.  Processor 
k  can  only  be  notified  of  the  failure  of  processor  j  no  earlier 
than  r.  Thus  the  finishing  time  of  the  whole  schedule  of 
processor  k  is  lengthened  by  t-BB->0,  resulting  in  a 
missed  deadline.  To  prove  the  sufficient  condition,  we  have 
that  any  pair  of  primary  and  backup  copies  ate  scheduled  on 
two  processors  and  there  is  no  overlapping  between  them. 


Then  the  failure  of  any  one  of  the  two  processors  will  trig¬ 
ger  the  execution  of  the  backup  tasks  on  another  processor. 
Thus  the  deadline  of  the  tasks  will  be  met  A 


m.The  Scheduling  Algorithm 

The  basic  idea  of  using  primary-backup  copy  approach 
to  tolerate  processor  failures  is  that  there  are  two  copies 
associated  with  each  task,  i.e.,  the  primary  copy  and  the 
backup  copy.  Oce  the  primary  copy  fails,  the  backup  copy 
is  activateil  Since  the  possible  execution  of  the  backup 
copies  should  also  be  finished  before  the  deadline,  enough 
time  must  be  reserved  on  each  processor  to  execute  the 
backup  copies.  The  reservation  of  enough  time  for  the  exe¬ 
cution  of  backup  copies  implies  that  redundant  processors 
have  to  be  used  to  execute  the  primary  task  set  earlier 
enough  so  that  once  a  processor  failure  occurs,  there  will  be 
time  to  execute  the  backup  copies. 

Our  scheduling  algorithm  is  based  on  the  First-Fit 
Decreasing  (FFD)  bin  packing  heuristic.  In  the  FFD  algo¬ 
rithm  for  bin  packing,  the  bins  are  numbered  from  1  to  M 
and  the  items,  pre-sotted  into  decreasing  order  of  size,  are 
packed  sequentially,  each  going  into  the  lowest  numbered 
bin  in  whi^  it  will  fit  In  our  algorithm,  we  regard  proces¬ 
sors  as  bins  and  tasks  as  items  having  sizes  equal  to  their 
computation  times.  As  shown  in  Lemma  1,  the  computation 
times  of  all  tasks  should  be  less  than  half  of  the  period  in 
order  to  tolerate  at  least  one  arbitrary  processor  failure. 
Because  the  deadline  of  the  tasks  are  known  a  priori  to  be 
r,  r  is  used  as  the  size  of  bins  for  the  FFD  heuristic. 

The  scheduling  algorithm  proceeds  as  follows:  First, 
the  primary  tasks  are  arranged  in  the  order  of  decreasing 
computation  times,  denoted  as  Second,  the 

FFD  heuristic  is  used  to  schedule  the  primary  copies  of  the 
tasks  into  bins  with  size  T.  More  specifically,  we  begin  with 
one  processor.  Once  the  assignment  of  a  task  fails  for  the 
existing  processors,  a  new  processor  is  added.  Tasks  are 
assigned  to  i»ocessors  in  the  order  of  their  decreasing  com¬ 
putation  time.  In  other  words,  task  P^  is  scheduled  before 
task  /’y,  whoe  i<y.  Task  P-^  is  assigned  to  the  lowest- 
indexed  processor  on  which  its  finishing  time  is  less  than 
the  period  T.  The  schedule  thus  obtained  is  called  the  pri¬ 
mary  schedule.  Let  the  number  of  processors  required  be 
m.  It  is  apparent  that  though  the  tasks  are  schedulable  to 
finish  before  the  deadline,  at  least  one  of  the  tasks  will  miss 
its  deadline  if  there  is  a  failure.  Therefore,  the  following 
stq)s  are  necessary.  Third,  the  primary  schedule  is  dupli¬ 
cated  on  another  set  of  m  processors  to  form  the  backup 
schedule.  The  tasks  in  the  backup  schedule  are  swapped 
based  on  the  swapping  rules  to  be  defined  below.  Fourth, 
the  tasks  in  the  two  schedules-primary  and  backup  sched¬ 
ules  are  all  renamed  according  to  the  following  renaming 
rule,  such  that  the  t/timary  schedule  uses  2xm  processors 


and  precedes  the  backup  schedule,  and  there  is  no  overlap¬ 
ping  between  any  pair  of  primary  and  backup  copies  of 
tasks. 

By  summarizing  what  we  described  above,  we  state  the 
algorithm  as  follows. 

procedure  scheduler  (Task  Set,  Period  T); 

Sort  the  set  of  tasks  in  the  order  of  decreasing 
computation  time  and  rename  them  P^,  P^, .... 

Apply  FFD  (First-Fit  Decreasing)  to  assign  the  set 
of  tasks  into  m  processors: 

Duplicate  the  schedule  on  m  backup  processors  to 
form  the  backup  schedule: 

Applying  swapping  rules  to  the  backup  schedule: 

Applying  the  renaming  rule  to  both  the  primary 
schedule  and  the  backup  schedule: 

end  scheduler. 

In  the  following,  we  define  the  rules  precisely  and 
prove  that  a  schedule  produced  by  applying  these  rules  can 
tolerate  one  arbitrary  processor  failure. 

Definition  1:  For  the  schedule  on  each  processor,  L  is 
defined  as  the  length  of  schedule  less  than  or  equal  to  half 
of  the  period  T  such  that  it  is  the  sum  of  the  computation 
times  of  those  tasks  whose  finishing  times  are  less  than  or 
equal  to  half  of  the  period.  L,  is  the  length  of  schedule  for  a 
processor.  L,  is  defined  as  the  L^-L^.  Obviously,  L^sT 
and  t^sT/2,  as  illustrated  in  Figure  A.  From  now  on, 
where  no  confusion  can  be  incurred.  is  also  used  to 
denote  the  time  interval  whose  length  is  L^.  and  are 
also  used  in  the  similar  manner. 
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In  Figure  1.  for  example,  and  are  equal  to  S  and 
9  respectively  for  processor  1.  for  processor  2,  is  4.  and 
L,  is  9. 

With  the  definition  of  as  above,  the  swapping  rules 
for  each  processor  in  the  backup  schedule  can  be  described 
as  follows: 

Swapping  Rules: 

(I)  Tasks  in  and  tasks  in  are  swapped  together. 


Primary 


Backup 


Figures 


Figures  D  &  E:  Any  twin  schedules  after 

swapping  but  before  renaming 


Case  2:  L^iL,dS  shown  in  Figures  D  &  E. 

The  tasks  in  are  swapped  with  the  tasks  in  L,.  First 
we  claim  that  there  are  at  least  two  tasks  in  L^.  Suppose 
there  is  only  one  task  in  L,.  Because  <  L^,  i.e.,  the  com¬ 
putation  time  of  any  task  in  is  shoner  than  the  computa¬ 
tion  time  of  the  only  task  in  L^,  this  contradicts  the  FFD 
algorithm  for  assigning  tasks  in  the  order  of  decreasing 
computation  time  to  processors.  Therefore  there  are  at  least 
two  tasks  in  L,.  We  further  claim  that  if  there  is  no  overlap¬ 
ping  between  the  first  primary  copy  in  of  Figure  E  and 
its  backup  copy  in  of  Figure  D,  there  is  no  overlapping 
between  the  primary  copy  of  any  task  and  its  backup  copy 
on  the  twin  processors.  Suppose  task  w  is  one  of  the  tasks, 
but  not  the  first  task  in  L,  of  Figure  E  and  its  primary  copy 
overlaps  with  its  backup  copy  in  L,  of  Figure  D.  Then  the 
computation  time  of  task  w  must  be  longer  than  that  of  the 
first  task  in  L,.  This  again  contradicts  the  rules  used  by  FFD 
to  assign  taste  in  the  order  of  decreasing  computation  time 
to  processors.  Now  suppose  that  the  first  primary  task  in 
of  Figure  E  overlaps  with  its  backup  task  in  L,  of  Figure  D 
for  the  length  of  £  >0  time  unit,  then  the  computation  time 
of  this  task  ts  L^  +  Z>  which  again  contradicts  the  rule 
used  by  FFD  to  assign  taste  to  processors.  We  have  shown 
that  there  is  no  overlapping  between  the  primary  copy  of 
any  task  in  L,  of  Figure  E  and  its  backup  copy  in  of  Fig¬ 
ure  D.  Since  the  primary  copy  of  any  task  in  L  of 

Figure  D  can  not  overlap  with  its  backup  copy  in  of  Fig¬ 
ure  E. 

From  the  above  two  cases,  it  is  clear  that  for  any  pair  of 
twin  processors,  one  arbitrary  processor  failure  is  tolerated 
and  the  deadlines  of  the  taste  are  guaranteed. 


Though  the  main  focus  of  our  scheduling  algorithm  is  to 
guarantee  taste  with  hard  deadlines  to  meet  their  deadlines 
even  in  the  presence  of  processor  failures,  taste  with  soft 
deadlines  still  have  ample  time  for  execution  if  there  is  no 
processor  failure  or  the  number  of  processor  failures  is 
small.  This  is  achieved  through  the  scheduling  of  primary 
copies  to  finish  around  half  of  the  period. 

The  time  complexity  of  the  algorithm  is  O  (nm)  if  the 
tasks  have  already  been  sorted  according  to  their  computa¬ 
tion  times.  The  sorting  can  be  done  in  t7(nIogn)  time. 
Thus,  the  complexity  of  this  algorithm  is  dominated  by  the 
sorting  process. 

Because  the  multiprocessor  scheduling  problem  is 
known  to  be  NF-compIete.  we  are  hopeless  in  finding  an 
optimal  solution  to  the  problem  even  when  the  numba  of 
taste  is  small  (e.g.  10).  Thus,  we  consider  the  most  ideal 
case,  which  we  call  “best  possible”.  The  number  of  proces¬ 
sors  used  in  the  most  ideal  case  is  the  result  of  taking  the 
ceiling  of  the  result  of  dividing  the  sum  of  computation 
times  of  all  the  tasks  (primary  and  backup)  by  the  cycle. 
The  performance  of  the  scheduling  algorithm  and  the  "best 
possible”  case  is  shown  in  Figure  4.  The  computation  time 
of  each  task  is  randomly  generated  from  the  range  of  [1. 
20].  The  period  T  is  90  time  units.  As  we  can  see,  there  is 
only  one  processor  diHerence  in  most  of  the  cases.  In  other 
words,  the  Juling  algorithm  finds  near  optimal  solu¬ 
tions. 


V.  Conclusion 

In  this  paper,  we  have  identified  the  real-time  fault-tol¬ 
erant  multiprocessor  scheduling  problem  and  proposed  an 
efficient  scheduling  algorithm  to  solve  it  Experiment 
results  show  that  the  scheduling  algorithm  finds  near-opti¬ 
mal  solutions.  We  have  also  shown  that  one  arbitrary  pro¬ 
cessor  failure  can  be  tolerated  by  the  scheduler. 

There  are  many  open  questions  which  need  to  be 
answered  in  order  to  design  extremely  reliable  hard  real¬ 
time  systems.  The  case  where  tasks  have  different  periods 
in  the  scheduling  problem  is  still  an  open  problem.  Another 
problem  remains  open  where  the  processors  available  in  the 
system  are  all  uniform  processors  (the  speeds  of  the  proces¬ 
sors  have  linear  relations).  These  are  the  topics  for  our 
future  research. 


IV.  Analysis  and  Performance  Evaluation 

It  is  apparent  that  the  scheduling  algorithm  meets  the 
scheduling  requirements  identified  in  Section  II.  In  the 
worst  case,  only  one  processor  failure  can  be  tolerated.  In 
the  best  case,  up  to  lm/2j  processor  failures  can  be  toler¬ 
ated,  where  m  is  the  total  number  of  processors  used. 
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Abstract 

Transactions  in  real-time  database  systems 
should  be  scheduled  considering  both  data  consistency 
and  timing  constraints.  Since  a  database  system  must 
operate  in  the  context  of  available  operating  system  ser¬ 
vices,  an  environment  for  database  systems  develop¬ 
ment  must  provide  facilities  to  support  operating  system 
functions  and  integrate  them  with  database  systems  for 
experimentation.  We  chose  the  ARTS  real-time  qierat- 
ing  system  kernel.  In  this  paper  we  present  our  experi¬ 
ence  in  integrating  a  relation^  datab^  manager  with  a 
real-time  operating  system  kernel  and  our  attempts  at 
providing  flexible  control  for  concurrent  transaction 
management  Current  research  issues  involving  the 
development  of  a  programming  interface  and  imprecise 
computing  server  are  also  discussed. 

1.  Introduction 

Time  is  the  key  factor  to  be  considered  in  real¬ 
time  database  systems,  and  the  correctness  of  the  system 
depends  not  only  on  the  logical  results  but  also  on  the 
time  within  which  the  results  are  produced.  Transactions 
must  be  scheduled  in  such  a  way  that  they  can  be  com¬ 
pleted  before  their  corresponding  deadlines  expire.  For 
example,  both  the  update  and  query  on  the  tracking  data 
for  a  missile  must  be  processed  within  given  deadlines, 
satisfying  not  only  database  consistency  constraints  but 
also  timing  constraints. 

Conventional  database  systems  are  typically  not 
used  in  real-time  applications  due  to  the  inadequacies  of 
poor  performance  and  lack  of  predictability.  They  are 
designed  to  provide  good  average  performance,  while 
possibly  yielding  unacceptable  worst-case  response 
times.  In  addition,  conventional  database  systems  do  not 
schedule  their  transactions  to  meet  response  require¬ 
ments  and  they  commonly  lock  data  tables  to  assure 
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only  the  consistency  of  the  database.  Locks  and  time- 
driven  scheduling  are  basically  incompatible,  resulting 
in  response  requirement  failures  when  low  priority 
transactions  block  higher  priority  transactions.  New 
techniques  are  required  to  manage  the  consistency  of 
real-time  databases,  and  they  should  be  compatible  with 
time-driven  scheduling  and  meet  both  the  required  tem¬ 
poral  consbaints  and  data  consistency. 

To  address  the  inadequacies  of  current  database 
systems,  the  transaction  sch^uler  needs  to  be  able  to 
t^e  advantage  of  the  semantic  and  timing  information 
associated  with  data  objects  and  transactions.  A  model 
of  real-time  transactions  needs  to  be  develtqied  which 
characterizes  distinctive  features  of  real-time  databases 
and  can  contribute  to  the  improved  responsiveness  of 
the  system.  The  semantic  information  of  the  transactions 
investigated  in  the  modeling  study  can  be  used  to 
develop  efficient  transaction  schedulers  [Son90b, 
Son91]. 

A  database  system  must  operate  in  the  context  of 
available  operating  system  services,  because  correct 
functioning  and  timing  behavior  of  database  control 
algorithms  depend  on  the  services  of  the  underlying 
operating  system.  As  pointed  out  by  Stonebraker,  oper¬ 
ating  system  services  in  many  systems  are  not  apprt^ri- 
ate  for  support  of  database  functions  [StonSl].  In  many 
areas,  such  as  buHer  management,  recovery,  and  concur¬ 
rency  control,  operating  system  facilities  have  to  be 
duplicated  by  database  systems  because  they  are  too 
slow  or  inapprof^te.  An  environment  for  database  sys¬ 
tems  development  must,  therefore,  provide  facilities  to 
support  operating  system  functions  and  integrate  them 
with  database  systems  for  experimentation. 

The  ARTS  real-time  operating  system  kernel 
under  development  at  Carnegie-Mellon  University 
attempts  to  provide  a  “predictable,  analyzable,  and  reli¬ 
able  distributed  real-time  computing  environment’’ 
which  is  an  excellent  foundation  for  a  real-time  database 
system  [Tok89].  The  ARTS  system,  which  provides  sup¬ 
port  for  programs  written  in  C  and  €-»•-»•,  implements  dif¬ 
ferent  prioritized  and  non-prioritized  scheduling 
algorithms  and  prioritized  message  passing  as  well  as 


supporting  lightweight  tasks.  All  of  these  features  are 
important  when  considering  a  real-time  database. 

We  have  investigated  the  issues  for  integrating 
real-time  database  systems  with  operating  system  ker¬ 
nels,  such  as  the  feauires  a  database  system  requires 
from  real-time  operating  system  kernels  to  provide  real¬ 
time  transaction  support  We  have  used  the  relational 
database  technology  since  it  provides  the  most  flexible 
means  of  accessing  distributed  data.  Our  research  effort 
resulted  in  a  new  database  manager  for  distributed  real¬ 
time  systems  on  top  of  ARTS.  The  result,  RTDB,  con¬ 
sists  of  a  muld-thr^ed  server  which  accepts  requests 
from  several  clients.  It  provides  a  three-tiered  approach 
for  supported  media  types,  offering  memory-resident 
data  options,  local  disk  storage,  and  access  to  the  UNIX 
file  system.  This  support  of  various  media  types  pro¬ 
vides  developers  the  flexibility  to  choose  appropriately 
those  that  b^t  suit  their  needs.  In  addition,  we  have 
incorporated  the  notion  of  imprecise  computation 
[Liu90]  into  RTDB  to  produce  meaningful  results  before 
the  deadline  of  a  task  by  trading  off  the  quality  of  the 
results  for  the  computation  time  of  the  task.  In  this  paper 
we  present  our  experience  in  integrating  a  relational 
database  manager  with  a  real-time  operating  system  ker¬ 
nel,  and  our  attempts  at  providing  flexible  control  for 
concurrent  transaction  management  using  a  technique 
called  workload  mediation.  Current  research  i^ues 
involving  the  development  of  a  programming  interface 
and  temporal  consistency  are  also  discussed. 

2.  Comparison  with  Other  Systems 

One  of  the  principal  goals  of  the  ARTS  project  is 
to  provide  a  more  easily  extensible  real-time  environ¬ 
ment  than  is  currently  enjoyed  by  programmers  devel¬ 
oping  on  other  kernels.  To  that  end,  ARTS  requires 
better  data  management  facilities  than  many  other  ker¬ 
nels  offer.  RTDB  on  ARTS  represents  a  combination  of 
desirable  aspects  of  database  technology  and  develop¬ 
ment  flexibility.  In  comparing  RTDB  with  other  existing 
systems,  we  note  some  differences  between  the 
approach  we  are  taking  and  that  of  other  research  or 
commercial  products. 

In  many  cases,  real-time  database  systems 
should  provide  facilities  to  process  concurrent  transac¬ 
tions  from  multiple  users.  It  requires  protocols  and  algo¬ 
rithms  for  transaction  scheduling  and  concurrency 
control.  In  real-time  transaction  scheduling,  the  actual 
execution  order  of  operations  is  determined  by  two  fac¬ 
tors;  priority  order  and  serialization  order.  The  difficul¬ 
ties  in  real-time  transaction  scheduling  arise  from  the 
fact  that  these  two  factors  have  different  natures  and  are 
constructed  in  different  ways.  While  the  serializable 
execution  order  is  strictly  bound  to  the  past  execution 


history,  the  priority  order  does  not  reflect  the  past  execu¬ 
tion  history  and  may  dynamically  destroy  the  order  set 
up  in  the  past  execution,  hence  serializability.  Most 
research  efforts  on  real-time  database  systems  concen¬ 
trate  on  scheduling  algorithms  to  solve  the  conflicts 
among  multiple  real-time  transactions  in  multi-user 
environment  However,  some  systems  assume  a  single 
user,  dedicated  processor  database  system.  In  such  an 
environment  there  is  no  need  to  schedule  multiple  real¬ 
time  tasks  on  a  single  processor. 

For  example,  CASE-DB  is  developed  as  a  sin¬ 
gle-user,  disk-based,  real-time  relational  DBMS,  which 
uses  the  relational  algebra  as  its  query  language 
[Ozso90].  In  CASE-DB,  it  is  assumed  that  the  probabil¬ 
ity  that  the  query  is  not  executable  within  the  deadline  is 
near  to  1.  To  process  this  kind  of  real-time  queries  by 
the  given  deadline,  they  restrict  the  queries  using  several 
techniques,  such  as  sampling  scheme  for  aggregate  que¬ 
ries.  Since  real-time  database  grows  by  time  quickly, 
even  for  periodic  query,  the  processing  time  can  be  dif¬ 
ferent  depending  on  the  number  of  scanned  uiples  or  the 
size  of  the  relation.  Thus  the  worst  case  execution  time 
of  a  transaction  can  be  hard  to  determine  or  impractical. 
Nevertheless,  the  question  is  how  practical  the  assump¬ 
tions  made  in  CASE-DB  would  te  in  actual  real-time 
system  environments.  Furthermore,  this  system  assigns 
priority  to  the  part  of  a  relation  C‘fragment  set”),  instead 
of  assigning  a  priority  to  a  query.  Then,  the  remaining 
problem  is  how  to  agree  a  prior  on  semantically  mean¬ 
ingful  subset  of  each  relation.  RTDB  diverges  from  this 
design  philosophy  in  many  ways,  being  a  multi-user, 
distributed  real-time  DBMS. 

Supported  media  types  also  differ  among  real¬ 
time  database  systems.  HP-RTDB,  one  of  Hewlett  Pack¬ 
ard’s  IndusDial  Precision  Tools,  provides  software 
application  developers  with  a  tool  to  structure  and 
access  memory-resident  data.  Essentially,  HP-RTDB  is 
a  library  of  routines  used  to  define  and  manipulate  a 
database  schema,  build  the  database  in  memory,  as  well 
as  load  and  unload,  and  write  or  read  data  to  and  from  it. 
It  also  provides  mechanisms  for  archiving  schema  and 
data  and  storing  timestamp  infonnation.  ARTS-RTDB 
supports  a  three^iiered  approach  to  data  storage.  The 
user’s  options  for  data  storage  include  memory-resident 
relations,  RAM-based  disk  relations,  and  storage  on  the 
UNIX  file  system.  Each  media  has  its  own  advantages 
and  drawbacks  in  terms  of  predictability,  performance, 
and  recoverability.  The  relation  media  abstraction  is 
demonsuated  in  Figure  1  which  depicts  the  ARTS- 
RTDB  testbed  at  the  University  of  Virginia.  Naturally, 
access  times  decrease  along  this  continuum.  This  sup¬ 
port  of  various  media  types  provides  developers  the 
flexibility  to  choose  appropriately  those  that  best  suit 
their  needs.  Also,  we  provide  the  ability  to  cross  the 


boundaries  between  these  media,  and  to  utilize  several 
media  types  in  an  individual  query  for  both  the  source 
and  resultant  relations.  A  detailed  discussion  on  the  per¬ 
formance  of  those  media  types  are  reported  in  [Son9  lb]. 

3.  The  ARTS  Real-Time  OS  Kernel 

Research  in  the  area  of  distributed,  real-time 
operating  systems  indicates  that  most  are  designed  for  a 
specific  ne^,  and  as  such  are  difficult  to  build,  main¬ 
tain,  and  modify;  in  addition,  they  do  not  afford  the 
capability  of  pr^icting  runtime  behavior  during  appli¬ 
cation  d^ign.  In  fact,  few  non-real-time  operating  sys¬ 
tems  provide  a  functionally  complete  set  of  general 
purpose,  real-time  task  and  time  management  functions, 
despite  the  fact  that  the  user  community  is  expressing 
the  desire  for  increasingly  complex  applications  of  this 
type.  Since  the  success  of  applications  in  real-time  com¬ 
puting  is  primarily  contingent  on  a  system’s  temporal 
functionality,  what  is  needed  is  an  environment  in  which 
the  system  engineer  can  analyze  and  predict,  during  the 
design  stage,  whether  the  given  real-time  tasks  having 
various  types  of  system  and  task  interactions  (i.e.  mem¬ 
ory  allocation/deallocation,  message  communications,  V 
O  interactions,  etc.)  can  meet  their  timing  requirements. 

In  an  attempt  to  provide  such  functionality, 
ARTS  provides  the  process  and  data  encapsulation  that 


other  distributed,  object-oriented  operating  systems  do, 
while  at  the  same  time  including  elements  of  temporal 
significance  to  the  services  it  provides.  This  integration 
of  data,  thread,  and  concurrency  control  greatly  facili¬ 
tates  real-time  schedulability  aiialysis.  ARTS  can  sup¬ 
port  both  hard  and  soft  real-time  tasks  as  well  as 
periodic  and  ^radic  ones  n'ok89]. 

To  support  time-critical  operations,  the  ARTS 
programming  language  interface  allows  designers  to 
specify  timing  requirements  and  the  chosen  communi¬ 
cation  structure  so  that  they  are  visible  at  both  the  lan¬ 
guage  and  system  level;  this  allows  the  system-wide 
ARTS  environment  to  make  scheduling  decisions  based 
on  both  temporal  constraints  and  priorities  of  transac¬ 
tions.  The  Integrated  Time-Driven  Scheduler  (ITDS) 
model  of  ARTS  is  more  effective  than  the  common  pri¬ 
ority-based  preemptive  scheduling  of  many  real-time 
systems.  Such  simple  schedulers  become  confused  dur¬ 
ing  heavy  system  loads  when  they  cannot  decide  which 
ta^s  are  important  and  should  be  completed  and  which 
tasks  should  be  aborted,  causing  unpredictability  in  the 
applications.  The  ITDS  model,  however,  employs  a 
time-varying  “value  function”  which  specifies  both  a 
task’s  time  criticality  and  semantic  importance  simulta¬ 
neously.  A  hard  real-time  task  can  be  characterized  by  a 
step  function  where  the  discontinuity  occurs  at  the  dead¬ 
line,  while  soft  real-time  tasks  are  described  by  continu- 


ous  (linear  or  nonlinear)  decreasing  function  after  its 
critical  time.  In  addition,  ARTS’  designers  have  sepa¬ 
rated  the  policy  and  mechanism  layers,  so  that  users  can 
implement  new  scheduling  policies  with  a  minimum  of 
effort,  and  even  dynamically  changing  the  policy  during 
runtime. 

The  issue  of  priority  inversion  is  crucial  to  pro¬ 
viding  semantically  correct  system  behavior  in  addition 
to  addressing  temporal  concerns.  Priority  inversion 
occurs  when  a  high  priority  activity  waits  for  a  lower 
priority  activity  to  complete.  Resource  sharing  and  com¬ 
munication  among  the  executing  tasks  can  lead  to  prior¬ 
ity  inversion  if  the  operating  system  does  not  manage 
the  available  resource  set  properly.  Significant  research 
in  the  construction  of  ARTS  was  done  to  avoid  priority 
inversion  among  concurrently  executing  tasks;  in  the 
processor  scheduling  domain,  low  priority  servers 
which  provide  service  to  clients  of  all  priorities  are  sus¬ 
ceptible  to  inversion.  For  example,  when  a  low  priority 
request  is  being  serviced,  and  a  high  priority  task 
requests  the  same  service,  the  high  priority  request 
waits,  since  the  server’s  computation  is  non-preempt- 
able.  Any  task  of  higher  priority  than  the  server  may 
preempt  the  server  itself,  however,  so  if  a  medium  prior¬ 
ity  task  arrives  it  preempts  the  server  indefinitely,  caus¬ 
ing  the  high  priority  job  to  be  lost  in  the  shuffle.  ARTS 
employs  a  priority  inheritance  mechanism  to  propagate 
information  about  a  single  computation  which  crosses 
task  boundaries.  That  is.  if  a  server  task  accepts  the 
request  of  a  client,  the  server  inherits  the  priority  of  the 
client  Furthermore,  the  server  should  also  inherit  the 
priority  of  the  highest  priority  task  waiting  for  the  ser¬ 
vice. 

The  notion  of  time  encapsulation  cannot  be 
divorced  from  the  basic  structure  of  ARTS,  in  which 
every  computational  entity  is  represented  as  an  object 
called  an  artobject.  An  artobject  is  defined  as  either  a 
passive  or  an  active  object  In  a  passive  object  there  is 
no  explicit  declaration  of  a  thread  which  accepts  incom¬ 
ing  invocation  requests  while  an  active  object  contains 
one  or  mote  threads  defined  by  the  user.  In  an  active 
object,  its  designer  is  responsible  for  providing  concur¬ 
rency  control  among  coexecuting  operations.  When  a 
new  instance  of  an  active  object  is  created,  its  root 
thread  will  be  created  and  run  immediately.  A  thread  can 
create  threads  within  its  object 

The  ARTS  kernel  supports  the  notion  of  real¬ 
time  objects  and  real-time  threads.  A  real-time  object  is 
defined  with  a  “time  fence,”  a  timer  associated  with  the 
thread  which  ensures  that  the  remaining  slack  time  is 
larger  than  the  worst  case  execution  time  for  the  opera¬ 
tion.  A  real-time  thread  can  have  a  value  function  and 
timing  constraints  related  to  its  execution  period,  worst- 
case  execution  time,  phase,  and  delay  value.  When  an 


operation  with  a  time  fence  is  invoked,  the  operation 
will  be  executed  (or  accepted)  if  there  is  enough  remain¬ 
ing  computation  time  against  the  specified  worst  case 
execution  time  of  the  operation  for  the  caller.  Otherwise, 
it  will  be  aborted  as  a  time  fence  error.  The  objective  of 
this  extension  to  a  normal  object  paradigm  is  to  prevent 
timing  errors  from  crossing  task  or  module  boundaries 
(as  often  happens  in  traditional  real-time  systems  which 
use  a  cyclic  executive)  and  to  bind  the  timing  error  at 
every  object  invocation. 

On  top  of  the  ARTS  foundation  we  have  built  a 
relational  database  manager  using  message  passing 
primitives  and  employing  the  client/server  paradigm. 
The  result,  RTDB,  currently  consists  of  a  multi-threaded 
server  which  accepts  requests  of  several  clients.  Based 
on  the  temporal  urgency  of  the  request,  the  server  deter¬ 
mines  whether  it  can  commit  the  transaction  or  if  it  has 
to  reject  it 

4.  The  RTDB  Real-Time  Database  Manager 

RTDB  is  a  relational  database  manager  written  in 
a  hybrid  C-based  language  called  ARTS/C-m-  designed 
to  run  on  ARTS.  It  offers  not  only  a  functionally  com¬ 
plete  set  of  relational  operators  —  such  as  join,  projec- 
don,  selecdon,  union,  and  set  difference  —  but  also 
other  necessary  operators  such  as  create,  insert,  update, 
delete,  rename,  compress,  sort,  extract,  import,  export, 
and  print  These  curators  give  the  user  a  good  amount 
of  reladonal  power  and  convenience  in  managing  the 
database. 

We  have  developed  two  different  kinds  of  clients 
for  RTDB.  One  is  an  imeraedve  command  parser/ 
request  generator  that  makes  requests  to  the  server  on 
behalf  of  the  user.  This  client  looks  and  behaves  simi¬ 
larly  to  a  single-user  database  manager.  It  is  possible  to 
tun  the  client  without  knowing  that  any  interacdon 
between  server  and  client  is  occurring.  The  other  client 
is  a  transaction-generadng  “batch”  client,  represendng  a 
real-dme  process  that  needs  to  make  daui>ase  access 
requests. 

The  RTDB  server  object  is  the  heart  of  the  data¬ 
base  management  system.  It  is  responsible  for  creadng 
and  storing  the'  reladons,  receiving  and  aedng  on 
requests  from  muldple  clients,  and  returning  desired 
information  to  the  clients. 

The  server  object  defines  three  threads.  The  root 
thread  is  an  aperiodic  thread,  which  is  automatically 
executed  by  ARTS  upon  invocadon  of  the  server.  The 
server  aedvates  one  or  more  worker  threads.  The 
worker  threads  are  aperiodic  and  each  one  has  a  differ¬ 
ent  priority  which  will  match  the  priority  of  the  mes¬ 
sages  it  will  service.  The  backup  thread  is  a  low  priority 
periodic  thread  responsible  for  periodically  bad  ing  up 


the  relations  that  reside  only  in  main  memory. 

The  root  thread  of  the  server  is  responsible  for 
binding  the  server’s  name  in  the  ARTS  name  server  so 
that  the  clients  can  find  it  and  send  requests.  It  is  also 
responsible  for  reading  the  relations  into  its  local  mem¬ 
ory,  initializing  the  lock  table  and  the  blocked  request 
queue,  instantiating  the  backup  thread  and  the  server 
worker  threads.  There  is  usually  one  worker  thread  cre¬ 
ated  for  each  priority  level.  After  completing  these 
tasks,  the  root  thread  enters  an  infinite  loop  that  accepts 
database  requests  from  any  client.  The  requests  come  in 
as  packets.  RTDB  provides  two  different  types  of  pack¬ 
ets:  call  packets  and  return  packets.  The  call  packet,  cre¬ 
ated  by  a  client,  contains  all  the  information  that  the 
server  needs  to  carry  out  the  desired  database  access 
operation.  Since  different  commands  require  different 
information,  the  call  packet  has  a  variant  field  contain- 
iitg  different  information  for  each  command.  When  the 
server  completes  the  processing  of  the  request,  it  returns 
a  packet  to  the  client  with  the  information  requested. 
This  packet  is  called  a  return  packet  The  return  packet 
is  created  by  the  server  and  also  has  a  variant  field  that 
carries  command  specific  information. 

The  communication  between  the  server  and  cli¬ 
ents  is  performed  by  the  ARTS  communication  primi¬ 
tives:  Request,  Accept,  and  Reply.  The  communication 
is  synchronous;  when  a  client  issues  a  Request,  it  is 
blocked  until  the  server  Accepts  and  Replies  to  the  mes¬ 
sage.  This  may  cause  some  problems,  especially  in  a 
real-time  environment,  for  two  reasons:  priority  inver¬ 
sion  and  data  sharing. 

The  ARTS  kernel  (and  thus  the  RTDB  system) 
supports  ei^t  message  priorities.  When  the  root  thread 
Accepts  a  message,  it  extracts  priority  information  from 
the  message  padcet  The  root  thread  then  enqueues  the 
request  on  the  message  queue  (i.e.  pending  request 
queue)  of  the  worker  thread  designated  to  service 
requests  of  that  priority  level.  If  inactive,  the  worker 
tiwMd  will  be  polling  its  queue;  if  active,  the  requests 
will  be  processed  in  FIFO  order.  Note  that  in  this  way 
we  can  easily  exploit  the  scheduling  merits  of  the  under¬ 
lying  ARTS  kernel  without  circumventing  its  priority- 
based  scheduling  mechanism.  Since  the  worker  thread’s 
priority  matches  that  of  the  messages  it  services,  it  will 
only  te  scheduled  for  the  CPU  in  an  interval  where  its 
priority  is  currently  the  highest  in  the  system.  This  is  a 
general  case;  for  those  instances  where  the  scheduling 
technique  is  not  priority  based,  or  ARTS  priority  inherit¬ 
ance  mechanism  is  employed,  these  decisions  will  naiu- 
.rally  be  reflected  in  the  workers. 

This  technique  of  disuibuting  requests  among  a 
pool  of  workers  ba^  on  information  contained  in  the 
requea  packet  is  called  workload  mediation.  In  our 
system  workload  mediation  is  realized  by  the  server  root 


thread  which  accepts  the  messages  from  the  clients  and 
puts  them  in  the  appropriate  worker  queue  according  to 
their  priority.  It  is  intrinsic  in  implementing  various 
transaction  prioritizing  algorithms  which  utilize  seman¬ 
tic  information  provided  by  the  clients  and/or  the  data¬ 
base  transaction  requests  such  as  user-entered  runtime 
estimates,  deadline  constraints,  or  command-to-priority 
mappings.  Determining  the  proper  balance  of  connol 
between  ARTS  primitives  and  RTDB  explicit  mediation 
will  help  us  achieve  the  most  beneficial  symbiosis  of  the 
system’s  resources.  Figure  2  illusuates  the  mediator 
mechanism  incorporated  within  the  server  object. 

This  mechanism  is  unobtrusive  from  the  view  of 
system-wide  scheduling,  because  it  does  not  do  any 
scheduling  on  its  own.  It  only  breaks  the  incoming 
workload  up  among  the  worker  pool.  Controlling  what 
criteria  are  used  to  make  this  static  assessment  is  impor¬ 
tant.  and  Table  1  indicates  the  techniques  we  are  investi¬ 
gating.  The  ARTS  OS  provides  eight  priority  levels.  In 
the  first  two  cases,  there  are  as  many  worker  threads  as 
priority  levels  and  the  mapping  is  direct;  in  the  first  case, 
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Table  1 

Workload  mediation  strategies  and  their  parameters 

according  to  message  priority,  and  in  the  second  case, 
according  to  the  priority  of  the  client  that  sent  the  mes¬ 
sage.  In  the  next  two  cases,  the  priority  levels  are  a  mul- 
ti[rfe  of  the  available  workers  so  the  priority  of  the 
incoming  message  (case  3)  or  that  of  the  client  that  sent 
it  (case  4)  has  to  be  divided  by  a  specific  number  before 
it  is  put  in  the  appropriate  worker  queue.  In  the  com¬ 
mand  mapping,  the  message  or  client  priority  is  ignored 
and  instead  the  priority  that  has  been  preassigned  to 
each  one  of  the  database  operations  determines  the 
worker  that  will  process  the  request.  The  next  case 
(complexity  mapping)  is  a  variation  of  the  previous  one 
where  priority  is  determined  according  to  previously 
calculated  complexity  rates  for  each  operation.  Site/ 
node  mapping  maps  each  node  participating  in  the  dis- 
u-ibuted  system  to  one  worker.  Finally,  media  based 
mapping  maps  the  request  according  U)  the  media 
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Figure  2.  Mediator  as  internal  server  object  ■ 


Trans.  Queue 


Figure  3.  Mediator  as  separate  object 


type(s)  (virtual  memory,  hram,  file  system)  housing  the 
feiatiQn(s)  used  in  the  request.  We  uy  as  much  as  possi¬ 
ble  to  make  mediation  invisible  and  unconuollable  from 
the  vantage  point  of  the  clients,  prinuuily  for  security 
reasons--  the  server  has  the  ability  to  ignore  a  query’s 
requested  priority  to  prevent  users  from  improperly 
seeking  a  higher  priority  than  they  deserve.  Also,  since 
the  server  is  in  a  better  position  to  analyze  the  current 
situation  than  remote  users,  allowing  the  server  to  moni¬ 
tor  itself  minimizes  the  amount  of  spurious  parameters 
that  might  enter  the  mediation  algwithms. 

Another  method  of  implementing  this  technique 
involves  creating  a  dedicated  object  which  acts  as  an 
intermediary  between  the  RTDB  client  and  server  (Fig¬ 
ure  3).  The  advantages  of  this  implementation  are 
enhanced  abstraction  for  the  mediation  algorithms,  and 
better  modeling  of  the  components  involved.  Also,  algo¬ 
rithm  implementors  have  a  centralized  repository  of  the 
routines  they  require  and  a  strong  definition  of  the  com¬ 
munication  interface  they  are  required  to  maintain. 
However,  it  is  our  contention  that  the  disadvantages  of 
this  scheme  outweigh  its  merits.  The  primary  disadvan¬ 
tage  is  a  severe  performance  decrease  due  to  the  over¬ 
head  of  increas^  inter-object  message  passing.  In 
addition,  many  of  the  data  structures  needed  by  the 
worker  threads  are  also  needed  by  the  mediation  algo¬ 
rithms. 

Returning  to  our  model,  the  worker  thread  of  the 
RTDB  server  performs  the  client’s  request  to  access  the 
database.  It  checks  its  request  message  queue,  calls  the 
appropriate  database  function  that  executes  the 
requested  operation,  and  replies  back  to  the  client  The 


worker  replies  to  a  client  without  completing  a  request 
when  it  needs  to  return  more  information  than  can  fit  in 
a  single  packet  In  such  a  case,  the  worker  sets  a  contin¬ 
uation  flag  indicating  that  there  is  mote  information  to 
be  sent  back  to  the  client  The  client  must  make  continu¬ 
ation  requests  to  the  server  until  it  gets  all  the  informa- 
den  requested.  To  maintain  the  consistency  of  the 
database,  the  RTDB  server  needs  to  handle  conflicting 
requests  prqrerly.  For  example,  a  problem  occurs  when 
some  request  or  part  of  a  request  (as  in  a  multi-relational 
query)  h^  to  be  blocked  since  it  needs  to  lock  a  relation 
that  is  already  locked.  Our  solution  is  to  use  a  lock  table 
that  keeps  track  of  which  relations  are  in  use  at  any 
given  time.  Currently,  we  use  a  coarse  granularity  for 
locks,  where  the  worker  locks  the  file  which  contains  the 
relation  it  needs  to  access.  If  a  request  for  file  A  comes 
in  while  file  A  is  being  used  by  another  active  worker, 
then  the  new  request  must  be  put  on  an  internal  queue 
until  A  and  any  other  files  it  needs  are  available.  Using 
coarse  granules  incurs  low  overhead  due  to  locking, 
since  there  are  fewer  locks  to  manage.  However,  it  also 
reduces  the  degree  of  concurrency,  since  operations  are 
more  likely  to  conflict.  Fine  granularity  locks  (e.g.,  tuple 
locks)  improve  the  degree  of  concurrency  by  allowing  a 
transaction  to  lock  only  those  data  items  it  accesses.  But 
fine  granularity  involves  higher  locking  overhead,  since 
the  number  of  locks  requested  and  that  to  be  maintained 
will  be  higher.  We  are  investigating  an  appropriate 
granularity  level  for  our  database  system,  including 
multi-granularity  locking  mechanisms  [Bem87]. 

Whenever  the  worker  becomes  free,  it  first 
checks  its  queue  of  blocked  requests.  If  there  are  any 


requests  in  the  block  queue  that  can  be  unblocked,  it 
dequeues  the  request  and  processes  it.  If  no  request  in 
the  block  queue  is  ready  to  be  processed,  the  worker 
looks  to  its  incoming  request  queue. 

5.  The  Programming  Interface 

Conventional  database  systems  often  provide 
some  interface  through  which  they  export  functionality 
to  application  developers.  Such  programming  interfaces 
simplify  storage  and  retrieval  tasks  and  provide  a 
scheme  for  the  creation,  manipulation,  and  destruction 
of  database  files.  For  systems  utilizing  the  client-server 
paradigm,  communication  primitives  can  also  be 
accessed  through  such  an  interface,  achieving  further 
hiding  of  the  implementation  details. 

Programming  interfaces  in  real-time  databases 
differ  greatly  in  terms  of  application-developer  friendli¬ 
ness.  Some  DBMS  interfaces  are  tightly  coupled  to  the¬ 
oretical  techniques  such  as  the  relation^  algebra. 
CASE-DB  [Ozs^]  is  an  example  of  this  type  of  inter¬ 
face.  While  this  interface  satisfies  the  desii^  function¬ 
ality  requirements  for  a  database,  it  can  be  awkward  to 
use  whra  developing  large,  complex  applications.  For 
these  applications  it  is  more  appropriate  to  use  an  inter¬ 
face  similar  to  those  already  in  use  in  non-real-time  sys¬ 
tems.  These  application  program  interfaces  consist  of 
library  functions. 

To  facilitate  the  construction  of  applicatimi  cli¬ 
ents,  we  have  written  an  triplication  programming  inter¬ 
face  (API)  for  the  database  command  set  which  hides 
the  implementation  details  of  the  system  as  much  as 
possible.  In  this  way,  developers  who  are  more  familiar 
with  function-call  interfaces  can  quickly  adjust  to  the 
task  of  constructing  custom  application  clients  instead 
of  application  programs.  In  action  to  providing  rou¬ 
tines  as  in  oth^  relational  databases,  we  can  hide  the 
details  of  ARTS’  Request/Accepi/Reply  message  pass¬ 
ing  sequence,  by  developing  an  tgipropriate  program¬ 
ming  interface  for  RTDB.  Lack  of  such  an  interface 
would  requite  the  triplication  developer  to  be  familiar 
with  the  RTDB  message  passing  mechanism,  since  it  is 
necessary  to  ensure  correct  conununication  between  the 
application  program  and  the  server.  Moreover,  the  user 
would  have  to  be  concerned  with  scheduling  and  con¬ 
currency  issues,  because  of  the  multi-user,  multi-pro¬ 
gramming  nature  of  RTDB. 

By  providing  a  programming  interface,  the  client 
and  server  appear  as  if  the  application  client  were  the 
only  one  interacting  with  the  server.  This  goal  is  only 
partially  attainable,  since  the  physical  code  provided  by 
the  application  developer  must  coexist  in  the  same 
source  code  file  the  as  code  which  specifies  constants 
and  declarations  necessary  to  construct  the  complete  cli¬ 


ent  image  (that  is,  certain  C-m-  tokens  which  allow 
object  creation  and  specification).  To  expedite  the 
development  process,  we  provide  a  thoroughly  com¬ 
mented,  stand^ized  client  template  with  which  devel¬ 
opers  need  only  combine  their  source  and  compile.  All 
the  system  specific  declarations  and  function  calls  that 
the  application  developer  need  not  be  ccmcemed  about 
are  coded  in  the  client  template.  These  include  the  data 
structure  declarations  used  by  the  API  and  all  the  object 
and  thread  declarations  and  instantiation  function  calls. 
When  writing  an  application,  the  user  forms  queries  by 
placing  operation-specific  information  into  function 
parameters  in  a  specified  format  and  then  calling  the 
^ipropriate  function.  This  way.  when  interaction  is  not 
needed,  a  number  of  database  operations  can  be  submit¬ 
ted  in  batch  mode  and  intermediate  results  can  be 
manipulated  and  acted  upon  in  a  predefined,  user-speci¬ 
fied  manner,  coded  in  the  application  program. 

We  currently  support  a  small  subset  of  database 
operations  through  the  API.  namely:  Create,  Insert, 
Update,  Select.  This  is  a  minimal  set  of  operations 
required  to  perform  experiments  on  any  relational  data¬ 
base.  We  are  planning  to  support  a  complete  interface 
by  providing  the  full  set  of  database  operations  currently 
supported  by  our  interactive  client 

6.  The  RTDB  Imprecise  Seiwer 

Certain  real-time  applications  require  that  some 
result  of  initiated  computation  be  available  at  a  deadline, 
often  at  the  cost  of  absolute  precision.  Much  research 
has  been  done  in  this  area,  aptly  named  imprecise  com¬ 
putation.  [Liu90].  The  concept  behind  imprecise  com¬ 
putation  is  most  often  associated  with  numeric 
computations  whose  precision  is  improved  proportion¬ 
ally  with  the  amount  of  time  spent  performing  the  calcu¬ 
lation.  However,  several  instances  of  using  this 
technique  with  a  database  merit  consideration. 

With  RTDB,  we  have  created  a  server  object 
capable  of  performing  imprecise  query  retrievals.  Basi¬ 
cally,  we  provide  the  client  a  mechanism  to  specify  a 
deadline  by  which  a  computation  (query)  must  com¬ 
plete.  This  was  ^ily  accommodated  by  ^ding  a  dead¬ 
line  field  to  the  request  packet  that  the  client  sends  to  the 
server.  Now  the  server  knows,  not  only  the  operation 
requested,  but  also  the  time  constraints  upon  the  opera¬ 
tion.  The  server  then  attempts  to  complete  the  query, 
checking  repeatedly  at  strategic  intervals  whether  it  is 
within  danger  of  missing  the  specified  deadline.  If  it  is 
not,  the  server  continues  working  and  returns  the  exact 
result  of  the  query  to  the  user.  If  unable  to  complete  the 
entire  query,  the  server  will  return  imprecise  data,  pro¬ 
vided  the  computation  had  proceeded  to  a  point  where 
the  ouq}ut  would  be  meaningful  and  appropriate.  This  is 
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an  important  caveat  to  consider,  as  returning  certain  vol¬ 
atile  data  structures  could  have  serious  detrimental 
effects  if  permitted.  The  result  is  that  all  tuples  returned 
from  a  query  match  the  search  criteria.  However  addi¬ 
tional  matching  tuples  may  also  exist  in  the  relation,  and 
they  would  have  been  retrieved  had  sufficient  time  been 
allotted. 

The  intervals  at  which  deacOine  checks  ate  made 
were  carefully  calculated.  Each  database  request  is  ser¬ 
viced  by  breaking  down  the  operation  in  simple  primi¬ 
tive  operations.  This,  in  turn,  requires  making 
appropriate  function  calls  that  implement  the  primitive 
operations.  The  execution  time  of  each  one  of  those 
functions  is  variable  and  sometimes  it  is  not  enough  to 
check  for  a  missed  deadline  upon  calling  and  returning 
from  the  function.  The  structure  of  several  functions  had 
been  changed  in  order  to  provide  real-time  services,  and 
in  many  cases,  deadline  checking  had  to  be  performed 
more  than  once  in  a  function  body.  One  problem  that 
hinders  the  transformation  of  a  non-real  time  function  to 
a  real-time  one  is  recursion.  Recursive  functions  are  not 
amenable  to  being  interrupted  as  easily  as  iterative  func¬ 
tions.  Due  to  these  and  other  minor  causes,  we  have 
used  the  state  machine  approach  in  representing  the  exe¬ 
cution  stages  of  each  function,  and  to  “pump”  through 
the  necessary  actions  with  a  measurable  amount  of  time 
allotted  to  each  stage  of  execution.  The  state  machine 
approach  proved  to  be  very  useful  because  it  simplified 
the  analysis  of  associated  routines.  It  is  our  hypothesis 
that  the  state  machine  is  a  useful  tool  for  the  t^-time 
database,  since  it  isolates  the  various  stages  of  action 
and  allows  run-time  estimates  to  be  computed  more  eas¬ 
ily. 

Table  2  indicates  an  instance  where  an  imprecise 
computation  may  yield  a  satisfactory  answer.  Here,  we 
issu^  the  aggregate  operator  “average”  on  a  numeric 


attribute  in  a  relation.  The  relation  file  involved  con¬ 
tained  978  tuples.  We  initially  set  the  deadline  time  to  be 
2S0  milliseconds  from  the  packet  transmission  time  at 
the  client  site  and  we  incremented  the  deadline  interval 
by  2S0  milliseconds  until  a  precise  answer  was 
obtained. 

7.  Conclusions  and  Future  Work 

A  real-time  database  manager  is  one  of  the  criti¬ 
cal  components  of  a  real-time  system,  in  which  tasks  are 
associated  with  deadlines  and  a  significant  portion  of 
data  is  highly  perishable  in  that  it  has  value  to  the  mis¬ 
sion  only  if  us^  in  a  timely  manner.  To  satisfy  the  tim¬ 
ing  requirements,  transactions  must  be  scheduled 
considering  not  only  consistency  constraints  but  also 
timing  constraints.  In  addition,  the  system  should  sup¬ 
port  a  predictable  behavior  such  that  the  possibility  of 
missing  deadlines  of  critical  tasks  could  be  determined 
ahead  of  time,  before  the  deadlines  expire.  Since  the 
characteristics  of  a  real-time  database  manager  ate  dis¬ 
tinct  from  conventional  database  managers,  there  ate 
different  kinds  of  issues  to  be  considered  in  developing 
a  real-time  database  manager.  For  example,  priority- 
based  scheduling  and  memory  resident  data  are  two 
such  issues. 

In  this  paper,  we  have  presented  an  experimental 
database  manager  developed  for  time-constrained  dis¬ 
tributed  systems.  The  foundation  now  exists  for  a  real¬ 
time  relational  database  manager.  We  have  discussed 
our  work  toward  providing  a  flexible  programming 
interface  and  standard  client  template  to  allow  quick 
prototyping  and  fast  modeling.  We  also  have  presented 
our  experiences  in  developing  a  server  based  on  the 
notion  of  imprecise  computing.  RTDB  described  in  this 
paper  with  its  multi-threaded  server  model  is  an  appro¬ 
priate  research  vehicle  for  investigating  new  techniques 


and  scheduling  algorithms  for  distributed  real-time  data¬ 
base  systems. 

As  with  any  active  research  project,  there  are 
several  technical  issues  associated  with  real-time  data¬ 
base  systems  that  need  further  investigation.  For  exam¬ 
ple.  temporal  database  components  are  being 
investigate  for  inclusion  in  RTDB.  They  will  address 
the  desired  timestamping  of  surveillance  updates  gener¬ 
ated  by  radar,  sonar,  or  similar  equipment,  and  temporal 
consistency  requirements  of  real-time  transactions. 
Other  potential  improvements  in  efficient  implementa¬ 
tion  are  being  examined  to  determine  their  overall  value 
to  RTDB.  Indices  and  views  are  two  of  them.  Since 
such  features  not  only  alter  the  speed  and  predictability 
of  the  system  but  also  the  basic  file  structure,  they  need 
to  be  examined  closely  on  their  own  and  then  as  new 
elements  within  the  existing  system. 

We  are  also  examining  inclusion  of  run-time 
estimates  for  various  commands  within  the  server  which 
will  enable  it  to  offer  a  choice  of  service  to  clients 
whose  work  cannot  be  completed  in  the  time  allotted: 
imprecise  results  or  a  miss^  deadline.  Conceivably 
some  clients  might  wish  to  simply  exclude  some  queries 
which  might  introduce  incomplete  -es  Its,  and  terminate 
as  quickly  as  possible.  These  execution  estimates  would 
be  maintained  in  a  table  in  the  crver  and  will  be  based 
on  several  factors  such  as  relation  file  size,  query  type, 
media  types  involved,  and  current  resource  utilization. 
Properly  implementing  such  a  heuristic  mechanism  will 
require  carefully  controlled  execution  timing,  and  some 
consideration  of  the  temporal  impact  of  held  data  locks. 

References 

[Abb89]  Abbott,  R.  and  H.  Garcia-Molina,  “Schedul¬ 
ing  Real-Tune  Transactions  with  Disk  Resi¬ 
dent  Data,"  VLDB  Conference,  August  1989. 
[Buc89]  Buchmann.  A.  et  al.,  'Time-Critical  Data¬ 
base  Scheduling:  A  Framework  for  Integrat¬ 
ing  Real-Time  Scheduling  and  Concurrency 
Control,"  Fifth  Data  Engineering  Cjnfer- 
ence.Feb.  1989,470-480. 

[Comp911  IEEE  Computer,  Special  Issue  on  Real-Time 
Systems,  vol.  24,  no.  S,  May  1991. 

[IEEE91]  Eighth  IEEE  Workshop  on  Real-Time  Oper¬ 
ating  Systems  and  Software.  Atlanta,  Geor¬ 
gia,  May  1991. 

[Kor90]  Korth,  H.,  “Triggered  Real-Time  Databases 
with  Consistency  Consttaints,"  16ih  VLDB 
Co/tference,  Brisbane,  Ausualia,  Aug.  1990. 
[Lin90I  Lin,  Y.  and  S.  H.  Son,  “Concurrency  Conuol 
in  Real-Time  Databases  by  Dynamic  Adjust¬ 
ment  of  Serialization  Order,"  1 1th  IEEE 


Real-Time  Systems  Symposium,  Orlando, 
Florida,  Dec.  1990,  to  appear. 

[Liu90]  Liu,  J.  et  al..  “Algorithms  for  Scheduling 
Imprecise  Computations,"  ONR  Annual 
Workshop  on  Foundations  of  Real-Time 
Computing,  Washington,  DC.  OcL  1990. 

(ONR91]  ONR  Annual  Workshop  on  Foundations  of 
Real-Time  Computing,  Washington,  DC, 
OcL  1991. 

[Ozso90]  Ozsoyoglu,  G,.  et  al.,  “CASE-DB-A  Real- 
Time  Database  Management  System,"  Tech. 
Rep.  Case  Western  Reserve  University,  1990. 

[Sha88]  Sha,  L.,  R.  Rajkumar,  and  J.  Lehoczky, 
“Concurrency  Control  for  Distributed  Real- 
Time  Databares,"  ACM  SIGMOD  Record  1 7, 
1.  Special  Issue  on  Real-Time  Database  Sys¬ 
tems,  March  1988, 82-98. 

[Sha911  Sha,  L.,  R.  Rajkumar,  S.  H.  Son,  and  C. 

Chang.  “A  Real-Time  Locking  Protocol," 
IEEE  Transactions  on  Computers,  vol.  40, 
no.  7,  July  1991,793-800. 

[Son88]  Son.  S.  H..  guest  editor,  ACM  SIGMOD 
Record  17,  1.  Special  Issue  on  Real-Time 
Database  Systems,  March  1988. 

[Son90]  Son,  S.  H.  and  C.  Chang,  “Performance 
Evaluation  of  Real-Time  Locking  Protocols 
using  a  Distributed  Software  Prototyping 
Environment."  lOlh  International  Coifer- 
ence  on  Distributed  Computing  Systems, 
Paris,  France,  June  1990, 124-131. 

[Son90b]  Son.  S.  H.  and  J.  Lee,  “Scheduling  Real- 
Time  Transactions  in  Distributed  Database 
Systems,"  7th  IEEE  Workshop  on  Real-Time 
Operating  Systems  and  Software,  Charlottes¬ 
ville,  \firginia.  May  1990, 39-43. 

(Son91]  Son,  S.  H.,  P.  Wagle,  and  S.  Paric.  “Real- 
Time  Database  Scheduling:  Design,  Imple¬ 
mentation,  and  Performance  Evaluation," 
International  Symposium  on  Database  Sys¬ 
tems  for  Advanced  Applications  (DASFAA 
‘91),  Tokyo.  Japan,  AjMil  1991. 146-155. 

[Son91b]  Son,  S.  H.,  M.  Poris,  and  C.  lannacone,  .q 
“Implementing  a  Disuibuted  Real-Tune 
Database  Manager,”  The  Second  Interna¬ 
tional  Symposium  on  Database  Systems  for 
Advanced  Applications  (D/.SFAA  ‘91). 
T(*yo,  Japan,  April  1991, 51-60. 

[Slon81]  Stonebraker,  M.,  Operating  System  Support 
for  Database  Management,  Common,  of 
ACM  24, 7  (July  1981),  412418. 

(Tok89]  Tokuda.  H.  and  C.  Mercer,  “ARTS:  A  Dis¬ 
tributed  Real-Time  Kernel,”  ACM  Operating 
Systems  Review,  23  (3),  July  1989. 


Replication  Control  for  Distributed  Real-Time  Database  Systems 

Sang  H.  Son  and  Spiros  Kouloumbis 

Computer  Science  Department 
University  of  Vii^ginia 
Charlottesville,  VA  22903,  USA 


ABSTRACT 

Schedulers  for  real-time  distributed  replicated  data¬ 
bases  must  satisfy  two  requirements:  transactions  should 
meet  their  timing  constraints,  and  mutual  consistency  of 
replicated  data  should  be  preserved.  In  this  paper,  we 
propose  a  new  replication  control  algorithm,  which  inte- 
grtdes  real-time  scheduling  and  replication  control.  The 
algorithm  adopts  a  token-based  scheme  for  replication 
control  and  attempts  to  balance  the  urgency  of  real-time 
transactions  with  the  conflict  resolution  policies.  In  addi¬ 
tion,  the  algorithm  employs  epsilon-serializability 
(ESR),  new  correctness  criterion  which  is  less  stringent 
than  conventional  one-copy-serializability.  The  algo¬ 
rithm  is  flexible  and  very  practical,  since  no  prior  knowl¬ 
edge  of  data  requirements  or  execution  time  of  each 
transaction  is  requited. 

1.  Introduction 

In  Retd-time  Distributed  Database  Systems  (RTD- 
DBS),  transactions  must  be  scheduled  to  meet  the  liming 
constraints  and  to  ensure  that  the  replicas  remain  mutu¬ 
ally  consistent  [Soti90].  Real-time  ta^  scheduling  can  be 
tts^  to  enfuce  timing  constraints  on  transactions,  while 
concurrency  control  is  employed  to  maintain  data  consis¬ 
tency.  Unfortunately,  the  integration  of  the  two  mecha¬ 
nisms  is  non  trivial  because  of  the  trade-offs  involved. 
Serializability  may  be  too  strong  as  a  conecuiess  crite¬ 
rion  for  concurrency  control  in  database  systems  with 
timing  constraints,  for  serializability  limits  concurrency. 
As  a  consequence,  data  consistency  might  be  compro- 
nused  to  satisfy  timing  constraints. 

In  real-time  scheduling,  tasks  are  assumed  to  be 
independent,  and  the  time  spent  synchronizing  their 
access  to  shared  data  is  assumed  to  be  negligible  com¬ 
pared  with  execution  time.  Knowledge  of  resource  and 
data  requirements  of  tasks  is  also  assumed  to  be  available 
in  advance. 

In  replication  control  methods,  on  the  other  hand, 
the  objective  is  to  provide  a  high  degree  of  concurrency 
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and  thus  faster  average  response  time  without  violating 
data  consistency  [SonST].  Two  different  policies  can  be 
employed  in  order  to  synchronize  concurrent  data  access 
of  transactions  and  to  ensure  identical  replica  values: 
blocking  transactions  ot  aborting  transactions.  However, 
blocking  may  cause  priority  inversion  when  a  high  prior¬ 
ity  transaction  is  blocked  by  lower  priority  transactions. 
Aborting  lower  priority  transactions,  though,  wastes  the 
work  done  by  them.  Thus,  both  policies  have  negative 
effects  on  time-critical  scheduling. 

Conventional  replication  control  algorithms  are  syn¬ 
chronous,  in  the  sense  that  they  require  the  atomic  updat¬ 
ing  of  some  number  of  copies.  This  leads  to  reduced 
system  availability  and  decreased  throughput  as  the  size 
of  the  system  increases.  On  the  other  hand,  asynchronous 
replication  control  methods  that  would  allow  more  trans¬ 
actions  to  meet  their  deadlines  suffer  from  a  basic  prob¬ 
lem:  the  system  enters  an  inconsistent  state  in  which 
replicas  of  a  given  data  object  may  not  share  the  same 
value.  Standard  correctness  criteria  for  coherency  con¬ 
trol  such  as  the  I -copy  serializability  (ISR)  [BerST]  are 
thus  hard  to  attain  with  asynchronous  consistency  con¬ 
trol. 

A  less  stringent,  general-purpose  consistency  crite¬ 
rion  is  necessary.  The  new  criterion  should  allow  more 
real-time  transactions  to  satisfy  their  timing  constraints 
by  temporarily  sacrificing  database  consistency  to  some 
small  degree.  Epsilon-serializability  (ESR)  is  such  a  cor¬ 
rectness  criterion,  offering  the  possibility  of  maintaining 
mutual  consistency  of  replicated  data  asynchronously 
[Pu9Il.  Inconsistent  data  may  be  seen  by  certain  query 
transactions,  but  data  will  eventually  converge  to  a  con¬ 
sistent  (ISR)  state.  Additionally,  the  degree  of  inconsis¬ 
tency  can  be  controlled  within  a  specified  threshold. 

The  goal  of  our  work  is  U)  design  a  replication  con¬ 
trol  algorithm  that  allows  as  many  transactions  as  possi¬ 
ble  to  meet  their  deadlines  and  at  the  same  time 
maintains  the  consistency  of  replicated  data  in  the 
absence  of  any  a  priori  information.  Our  algorithm  is 
based  on  a  token-based  synchronization  scheme  for  rep¬ 
licated  data  in  conventicmal  distributed  databases.  Rea¬ 
dme  scheduling  features  are  developed  on  top  of  this 
platform.  Epsilon-serializability  is  employed  as  the  cor- 


recmess  criterion  that  guarantees  the  robustness  of  the 
scheme. 

2.  Database  Model 

Before  presenting  our  real>time  replication  scheme, 
we  first  present  the  model  of  the  underlying  distributed 
database  system  and  transaction  processing. 

2.1  Distributed  System  Environment 

A  distributed  system  consists  of  multiple  autono¬ 
mous  computer  systems  (sites)  connected  via  a  commu¬ 
nication  network.  Each  site  maintains  a  local  database 
system.  The  smallest  unit  of  data  accessible  to  the  user  is 
called  data  object.  A  data  object  is  an  abstraction  that 
does  not  correspond  directly  to  a  real  database  item.  In 
distributed  database  systems  with  replicaed  data  objects, 
a  logical  data  object  is  represented  by  a  set  of  one  or 
more  replicated  physical  data  objects.  We  assume  that 
the  database  is  fully  replicated  at  all  sites. and  Write 
are  the  two  fundamental  types  of  logical  operations  tha 
ate  implemented  by  executing  the  respective  physical 
toleration  on  one  or  more  copies  of  the  physical  data 
t^jea  in  question.  A  token  designates  a  read-write  copy. 
Each  logical  data  object  has  a  predetermined  number  of 
tokens,  and  each  token  copy  is  the  latest  version  of  the 
data  object  The  site  which  has  a  token-copy  of  a  logical 
data  object  is  called  a  token  site,  with  respect  to  the  log¬ 
ical  data  object  In  order  to  control  the  access  to  data 
objects,  the  system  uses  timestamps.  When  a  write  oper¬ 
ation  is  successfully  performed  and  the  transaction  is 
committed,  a  new  version  is  created  which  replaces  the 
previous  version  of  the  token  copy. 

When  a  transaction  performs  a  write  operation  to  a 
physical  data  object,  there  are  two  values  that  are  associ¬ 
ate  with  the  d^i  object:  the  after-vtdue  (the  new  ver¬ 
sion)  and  the  btfore-vtdue  (the  old  version).  Because  the 
before-value  is  available  during  the  transaction  process¬ 
ing,  it  is  natural  to  ask  if  concurrency  can  be  improved  by 
giving  out  this  value  [BaySO].  For  example,  if  the  trans¬ 
action  Ty  has  been  given  a  permission  to  write  the  new 
value  of  a  data  object  and  the  transaction  Ty  requests  to 
read  the  same  data  object,  then  it  is  possible  to  give  T2  the 
before-value  of  the  data  object,  instead  of  making  Ty  wait 
until  Ty  is  finished.  However,  an  appropriate  control 
must  be  exercised  in  doing  so,  otherwise  the  database 
consistency  might  be  violated.  In  the  example  above, 
assume  that  Ty  has  written  a  new  value  for  two  data 
objects  X  and  Y,  and  Ty  has  read  the  before-value  of  X. 
Ty  wants  to  read  Y  also.  If  Ty  gets  the  after-value  of  Y 
created  by  Ty,  there  is  no  serial  execution  of  Ty  and  Ty 
having  the  same  effect  because  in  reading  the  before¬ 


value  of  X,  Ty  sees  the  database  in  a  state  before  the  exe¬ 
cution  of  Ty.  and  in  reading  the  after- value  of  Y,  Ty  sees 
the  database  in  a  state  after  the  execution  of  T y. 

2.2  TVansactions 

A  transaction  is  a  sequence  of  operations  that  takes 
the  database  from  a  consistent  state  to  another  consistent 
state.  Two  types  of  transactions  are  allowed  in  our  envi¬ 
ronment:  query  transactions  and  update  transactions. 
Query  transactions  consist  only  of  read  operations  that 
access  data  objects  and  return  their  values  to  the  user. 
Update  transactions  consist  of  both  read  and  write  (St¬ 
ations.  They  execute  a  sequence  of  local  computations 
and  update  the  values  of  all  replicas  of  each  associated 
data  object. 

Transactions  arriving  at  the  system  are  assumed  to 
be  non-periodic.  A  globally  unique  timestamp  is  gener¬ 
ated  for  each  transaction  [Lam78].  Each  time  a  transac¬ 
tion  is  aborted  and  resubmitted,  a  new  timestamp  value 
is  assigned  to  it  If  a  transaction  Ty  has  a  smaller  times¬ 
tamp  than  another  transaction  7y,  we  say  that  Tj  is  the 
older  transaction  and  Ty  is  the  younger  one. 

We  assume  no  a  priori  knowledge  of  which  or  how 
many  data  objects  are  going  to  be  accessed  by  each  indi¬ 
vidual  transaction.  However,  we  assume  that  the  average 
length  of  query  and  update  transactions  are  known  in 
order  to  control  the  level  of  inconsistency.  Transactions 
that  miss  their  deadlines  are  immediately  aborted. 

Read  or  write  operations  of  the  same  transaction  are 
executed  one  by  one  in  a  serial  fashion.  Each  read  and 
write  carries  the  timestamp  of  the  transaction  that  issued 
it,  and  each  copy  carries  the  timestamp  of  the  transaction 
that  wrote  it  A  conflict  occurs  when  a  transaction  issues 
a  request  to  access  a  data  object  for  which  other  transac¬ 
tion  has  previously  issued  a  request  to  access,  and  fur¬ 
thermore  at  least  one  of  these  requests  is  a  write  request 
There  are  three  kinds  of  conflicts:  read-write  (RW), 
write-iead  (RW),  and  write-write  (WW)  conflicts 
(Ber871.  In  each  case,  we  say  that  the  transaction 
requesting  the  new  access  has  caused  a  conflict 

2.3  Token-Based  Conflict  Resolution 

Let  Ty  be  the  transaction  which  already  issued  an 
access  request,  and  Ty  cause  the  conflict  For  each  token 
copy  of  X,  conflicts  are  resolved  as  the  following 
[Son89]: 

(1)  RW  conflict:  If  Ty  is  younger  than  Ty,  then  it 
wails  for  the  termination  of  T y.  If  Ty  is  older  than  Ty,  then 
it  reads  before-value  of  X. 

(2)  WR  conflict'  If  Ty  is  younger  than  Ty,  then  its 
write  request  is  granted  with  the  condition  that  Ty  cannot 


commit  before  the  lamination  of  Ty.  If  T2  is  older  than 
Ti,  then  Tj  is  rejected. 

(3)  WW  conflict:  If  T2  is  younger  than  T/,  then  it 
waits  for  the  termination  of  Ty.  If  ^2  is  older  than  T y.  then 
7*2  is  rejected. 

The  coordinator  of  an  update  transaction  maintains 
the  brfore-list  (BL),  a  list  of  transactions  which  read  the 
before>vaIue  of  any  data  object  in  its  write  set,  and  the 
(0er-Ust  (AL).  a  list  of  transactions  which  write  the  after- 
value  of  any  data  object  in  its  read  set  The  BL  and  AL 
are  used  during  the  commitment  phase  of  every  update 
transaction. 

When  a  transaction  T2  reads  the  before-value  of  a 
data  object  locked  by  Tj,  the  token-site  which  gives  the 
before-value,  conveys  the  identifier  of  T2  to  the  coordi¬ 
nator  of  Ty.  Hence,  the  identifier  of  T2  is  inserted  in  the 
before-list  of  Ty,  which  stores  all  the  transactions  that 
read  the  before-values  of  any  data  object  in  Ty’s  write- 
set  The  transaction  manager  at  the  read-only  site  of  T2 
also  conveys  the  identifier  of  Ty  to  the  coordinator  of  T2. 
Actually,  the  identifier  of  Ty  is  inserted  in  the  after-list  of 
72,  which  stores  all  the  transactions  that  write  the  after¬ 
value  of  any  data  object  in  72 ’s  read-set 

When  a  transaction  terminates  (either  commits  or 
aborts),  the  coordinator  of  the  terminating  transaction 
must  inform  the  coordinator  of  each  transaction  in  its  AL 
about  the  termination  by  sending  Termination  Messages 
(TM).  On  receiving  a  TM  from  the  coordinator  of  a  trans¬ 
action  in  its  BL,  the  coordinator  of  the  active  transaction 
removes  the  identifier  of  the  terminating  transaction 
(sender  of  the  TM)  from  the  BL.  A  transaction  can  com¬ 
mit  only  when  its  BL  is  empty.  By  this  way,  we  prevent 
non-seiializable  execution  sequences  to  occur. 

Update  transactions  have  their  own  private  work¬ 
space  where  they  initially  apply  their  write  operations. 
Update  transactions  commit  by  employing  a  two-phase 
protocol.  In  the  first  phase  {vote-phase),  an  update  trans¬ 
action  sends  an  update  message  to  each  token-site  of 
every  data  object  in  its  write-set.  The  transaction  waits 
until  it  gets  a  response  from  all  the  token-sites  for  each 
data  object  If  all  token-sites  vote  YES,  then  the  transac¬ 
tion  enters  the  second  phase  {commit  phase).  It  sends  the 
actual  value  of  each  data  object  to  be  written  to  the 
respective  token-sites.  Update  messages  to  non  token- 
sites  can  be  scheduled  after  commitment  Therefore,  a 
temporary  and  limited  difference  among  object  replicas 
is  permitted;  these  replicas  are  required  to  converge  to 
the  standard  ISR  consistency  as  soon  as  all  the  update 
messages  arrive  and  are  processed.  An  update  uansac- 
tion  that  executes  its  commit  phase  can  never  be  aborted, 
even  if  it  potentially  conflicts  with  another  transaction. 

Query  transactions  fall  into  three  different  catego¬ 
ries  as  far  as  the  correctness  of  their  response  is  con¬ 
cerned: 


•  Required  consistent  queries.  Queries  are  specified 
-as  such  when  they  are  first  submitted  by  the  user,  and 
they  are  always  guaranteed  to  return  consistent  data; 

•  Consistent  queries.  Their  final  output  is  correct 
regardless  of  any  requirement  by  the  user, 

•  Possibly  inconsistent  queries.  In  case  of  such  a 
query,  there  exists  a  small  possibility  that  returned  values 
of  a  replicated  data  object  might  reflect  an  inconsistent 
state  of  the  database. 

Consider  a  read  operation  of  transaction  Tj  on  a  data 
object  X.  If  the  local  copy  of  X  has  timestamp  >  times¬ 
tamp  (Ti)  then  the  local  value  is  returned.  Otherwise,  an 
Actualization  Request  Message  (ARM)  is  sent  to  any 
available  token-site  to  actualize  the  read-only  copy.  At 
the  token-site,  an  ARM  is  treated  the  same  as  a  read 
request,  and  the  current  version  of  the  data  object  will  be 
returned.  However,  depending  on  their  categories,  query 
transactions  are  not  always  guaranteed  to  return  accurate 
results. 

3.  Epsilon-Serializability 

Epsilon-serializability  (ESR)  is  a  correcmess  crite¬ 
rion  that  enables  asynchronous  maintenance  of  mutual 
consistency  of  replicated  data  |Pu91].  A  transaction  with 
ESR  as  its  correctness  criterion  is  called  an  epsilon- 
transaction  (ET).  An  ET  is  a  query  ET  if  it  consists  of 
only  reads.  An  ET  containing  at  least  one  write  is  an 
update  ET.  Query  ETs  may  see  an  inconsistent  data  state 
inoduced  by  update  ETs.  The  metric  to  control  the  level 
of  inconsistency  a  query  may  return  is  called  the  overlap. 
It  is  defined  as  the  set  of  all  update  ETs  that  are  active  and 
affecting  data  objects  that  the  query  seeks  to  access.  If  a 
query  ET’s  overlap  is  empty,  then  the  query  is  serializ¬ 
able. 

3.1  Query  Overlap  Considerations 

The  overlap  of  an  active  query  transaction  Q  can  be 
used  as  an  upper  bound  of  error  on  the  degree  of  incon¬ 
sistency  that  Q  may  accumulate.  Given  that  we  are  inter¬ 
ested  in  how  many  update  transactions  overlap  with  Q 
more  than  which  uansactions  those  are,  the  term  overlap, 
in  its  further  usage,  will  reflect  the  cardinality  of  the  set 
of  update  transactions  that  conflict  with  the  query  ET  Q. 
More  formally,  query  Q's  overlap  is  described  as  fol¬ 
lows: 

OverlaplQI  =  //  (U,  /  U,  update  trans  a 

Ui  active  during  Q  a  write-set(U  J 
n  read-set(Q)  *0}  U 

Suppose  we  have  a  database  of  A  distinct  data 
objects,  and  that  query  transactions  read  m  data  objects 


on  (he  average,  and  possibly  conflict  with  update  trans¬ 
actions  that  update  n  data  objects  on  the  average.  The 
exact  value  for  the  overlap  number  can  be  computed  as 
follows.  We  compute  the  maximum  allowable  overlap  of 
the  query  Q  for  a  given  degree  of  query  inconsistency  p. 
The  probability  that  Ui  accesses  n  objects  different  from 
any  of  the  m  objects  of  Q's  read  set  is: 


A-m  A-m-l  A-m-n 

Pi  =  - X  — : — ; —  X  ...  X - 

^  A  - 1  A- n 

So  the  probability  that  (/,-  has  common  elements 
with  Q  (i.e.  Q  overlapping  with  f/,)  is:  I  -  p,. 

The  probability  for  a  query  uansaction  to  overlap 
with  an  arbitrarily  chosen  update  transaction  is: 


,  ,  ,  A-  tn  A-m-n 

I  =  l-p.=  1 - —  X  ...  X  — - 

A  A-  n 


(A-m)!  (A-rt-  1)1 
A!  (A-m-n-  UT 


Variable  essentially  represents  the  inconsistency 
probability  that  a  query  overlaps  with  exactly  one  update 
transaction  (k  =  1).  If  k  is  the  maximum  overlap,  then  the 

k 

equation  p  must  hold.  We  emphasize  that  we 

i>l 

have  k  distinct  update  transactions  that  potentially  con¬ 
flict  with  the  query. 

Since  we  assume  that  read/update  sets  are  uniformly 
distributed  within  the  database,  we  have  /,- = /  V  <  ^  k,  and 
thus  kxlsp.  Solving  the  equation  for  k  after  substituting 
the  value  for  /,  the  overlap  bound  k  is: 


currency  than  ISR  in  two  ways.  First,  query  ETs  can  be 
processed  in  any  order  because  they  are  allowed  to  see 
intermediate,  inconsistent  results.  Second,  update  ETs 
may  update  different  replicas  of  the  same  object  asyn¬ 
chronously,  but  in  the  same  order.  In  this  way,  update  ETs 
produce  results  equivalent  to  a  serial  schedule;  these 
results  are  therefore  consistent 

There  are  two  categories  of  transaction  conflicts  that 
we  examine:  conflicts  between  update  uansactions  and 
conflicts  between  update  and  query  uansactions. 

Conflicts  between  update  transactions  can  be  either 
RW  conflicts  or  WW  conflicts.  Both  types  must  be  strictly 
resolved.  No  correctness  criteria  can  be  relaxed  here, 
since  execution  of  update  uansactions  must  remain  ISR 
in  order  for  replicas  of  data  objects  to  remain  identical. 

Conflicts  between  update  and  query  transactions  are 
of  RW  type.  Each  time  a  query  conflicts  with  an  update, 
we  say  that  the  query  overlaps  with  this  update,  and  the 
overlap  counter  is  incremented  by  one.  If  the  counter  is 
still  less  than  a  specified  upper  bound  (i.e.  the  value  of  k 
derived  above),  then  both  operation  requests  are  pro¬ 
cessed  normally,  the  conflict  is  ignored,  and  no  transac¬ 
tion  is  aborted.  Otherwise,  RW  conflict  must  be  resolved 
by  using  the  conventional  ISR  correcmess  criteria  of  the 
accommodating  algorithm. 

The  performance  gains  of  the  above  conflict  resolu¬ 
tion  policies  are  numerous.  Update  transactions  are 
rarely  blocked  or  aborted  in  favor  of  query  transactions. 
They  may  be  delayed  on  behalf  of  other  update  uansac¬ 
tions  in  order  to  preserve  internal  database  consistency. 
On  the  other  hand,  query  transactions  are  almost  never 
blocked  provided  that  their  overlap  upper  bound  is  not 
exceeded  Finally,  update  uansactions  attain  the  flexibil¬ 
ity  to  write  replicas  in  an  asynchronous  manner. 

4.  Real-Time  Issues 


(A-m)!  (A-n-  1)! 
A!  (A-m-n-  1)! 


Even  though  this  choice  of  the  overlap  bound  k  is 
reasonable,  it  is  not  unique  or  critical.  The  algorithm  U) 
be  presented  in  Section  S  will  work  with  other  choices  of 
k.  or  even  in  its  absence. 

3.2  E-Transaction  Compatibility 

Among  several  replica  control  methods  based  on 
ESR,  we  have  chosen  the  ordered  updates  approach 
[Pu91].  The  ordered  updates  approach  allows  more  con- 


In  real-time  databases,  transactions  are  character¬ 
ized  by  their  timing  constraints  and  their  data  and  com¬ 
putation  requirements.  Timing  constraints  are  expressed 
through  the  release  lime  and  the  deadline.  Computation 
requirements  for  uansactions  are  unknown,  and  no  run¬ 
time  estimate  is  available  for  every  transaction  that 
enters  the  system.  Neither  are  data  requirements  known 
beforehand,  but  they  are  discovered  dynamically  as  the 
uansacUon  executes.  Our  goal  is  to  minimize  the  number 
of  uansactions  that  miss  their  deadlines  [AbbSS]. 

The  real-time  scheduling  part  of  our  scheme  has 
three  components:  a  policy  to  determine  which  uansac¬ 
tions  are  eligible  for  service,  a  policy  for  assigning  prior¬ 
ities  to  transactions,  and  a  policy  for  resolving  conflicts 
between  two  uansactions  that  want  to  lock  the  same  data 
object.  None  of  these  policies  needs  any  more  informa- 


lion  about  transactions  than  the  deadline  and  the  name  of 
the  data  object  cunently  being  accessed. 

All  transactions  which  are  currently  not  tardy  are 
eligible  for  service.  Transactions  that  have  already 
missed  their  deadlines  are  immediately  aborted.  When  a 
transaction  is  accepted  for  service  at  the  local  site  where 
it  was  originally  submitted,  it  is  assigned  a  priority 
according  to  its  deadline.  The  transaction  with  the  earli¬ 
est  deadline  has  the  highest  priority.  This  policy  meshes 
efficiently  with  the  “not  tardy”  eligibility  policy  adopted 
above,  so  that  transactions  that  have  already  missed  their 
deadlines  are  automatically  screened  out  before  any  pri¬ 
ority  is  assigned  to  them.  High  priority  is  the  policy  that 
is  employed  for  resolving  transaction  conflicts.  Transac¬ 
tions  with  the  highest  priorities  are  always  favored.  The 
favored  transaction,  i.e.  the  winner  of  the  conflict,  gets 
the  resources  that  it  needs  to  proceed  (e.g.,  data  locks  and 
the  processor  [Car89]).  The  loser  of  the  conflict  relin¬ 
quishes  control  of  any  resources  that  are  needed  by  the 
winner.  The  loser  transaction  will  either  be  aborted  or 
blocked  depending  on  the  relative  age  of  the  two  con¬ 
flicting  transactions  and  the  special  provisions  made  by 
the  replication  control  scheme. 

5.  Replication  Control  Scheme 

In  this  section,  we  present  the  token-based  replica¬ 
tion  conuol  scheme  in  detail,  along  with  the  embedded 
ESR  correcmess  criteria  and  real-time  constraints. 

5.1  Controlling  Inconsistency  of  Queries 

Queries  are  only  involved  in  RW/WR  conflicts. 
When  a  query  transaction  is  submitted  to  the  system,  the 
user  may  quantify  it  with  the  restriction  “required  to  be 
consutent.”  Such  a  characterization  means  that  all  possi¬ 
ble  future  RW/WR  conflicts  between  this  query  and 
update  transactions  will  have  to  be  resolved  in  a  strict 
(ISR)  way.  In  other  words,  consistent  queries  (CQs)  are 
treated  in  the  same  fashion  as  update  transactions.  Values 
returned  by  CQs  are  always  correct,  reflecting  the  up-to- 
date  state  of  the  respective  data  objects. 

If  no  consistency  constraints  are  specified  explicitly 
by  the  user  on  a  submitted  query,  then  the  ESK  correct¬ 
ness  criterion  is  employed  to  maintain  the  query’s  con¬ 
sistency.  The  overlap  upper  bound  is  computed,  and  an 
overlap  counter  is  initialized  to  zero.  Each  time  the  query 
conflicts  with  an  update  transaction  over  the  same  data 
object  and  the  counter  is  less  than  the  overlap  upper 
bound,  the  conflict  is  ignored,  the  counter  is  incre¬ 
mented,  the  query  reads  the  value  of  the  data  object  in 
question  and  proceeds  to  read  the  next  object.  When  the 
overlap  counter  is  found  to  be  equal  to  the  upper  bound. 


current  and  ail  subsequent  conflicts  must  be  resolved  in  a 
strict  manner,  so  that  no  more  inconsistency  will  be  accu¬ 
mulated  on  the  query. 

When  a  query  transaction  eventually  commits,  the 
user  is  able  to  determine  the  degree  of  correcmess  of  the 
data  values  returned.  If  the  query  was  qualified  as  a  CQ. 
then  the  user  can  be  confident  that  the  values  returned  are 
coherent.  For  regular  query  transactions,  the  private 
overlap  counter  is  checked.  If  the  counter  is  still  zero, 
this  means  that  no  conflict  has  occurred  throughout  the 
entire  execution  of  the  query  and  the  results  must  again 
be  perfectly  accurate.  Such  a  query  falls  into  the  CQ 
class.  An  overlap  counter  greater  than  zero  indicates  that 
a  certain  number  of  conflicts  with  update  transactions 
remained  unresolved;  the  query  had  seen  some  possibly 
inconsistent  intermediate  states,  and  might  yield  some 
inaccurate  data.  This  last  type  of  query  falls  into  the 
“possibly  inconsistent"  queries  class.  Tlie  probability 
that  such  a  query  outputs  inconsistent  data  is  bounded  by 
the  probability  p  which  was  used  in  the  calculation  of  the 
overlap  limit  k.  Data  values  can  then  be  referenced  with 
1(1 -p)'<  100}%  confidence  in  their  correcmess. 

Since  arbitrary  queries  may  produce  results  beyond 
allowed  inconsistency  even  within  its  overlap  limit,  it  is 
important  to  restrict  ET  queries  to  have  certain  properties 
that  permit  tight  inconsistency  bounds.  A  first  attempt  in 
this  ^proach  is  proposed  in  rRam91].  It  is  beyond  the 
scope  of  this  paper  to  deal  with  such  strategies.  In  the 
remainder  of  the  paper,  we  assume  that  inconsistency 
bounds  can  be  enforced  by  the  system  if  necessary. 

For  each  query  transaction  T,  we  can  also  provide 
the  number  of  possibly  incorrect  values  read  by  T  by 
checking  the  overlap  counter  of  T  and  the  number  of  data 
objects  read  by  T.  Let  be  the  exact  number  of  data 

objects  read  by  T  and  be  the  value  of  the  overlap 
counter  of  T  after  T  is  terminated.  The  number  of  possi¬ 
bly  incorrect  data  values  read  by  T  is:  x  Ptx>  where 


=  *„  X  ( 1  - 


(A-/n„)!(A-/i-l)! 
A!  (A  -  -  n  -  1) ! 


is  the  exact  probability  that  T  is  inconsistent. 

5.2  Conflict  Resolution 

Mechanisms  for  conflict  resolution  between  update 
transactions  comprise  the  core  of  our  scheme.  Query 
transactions  need  not  be  considered  separately  because 
queries  that  are  forced  to  resolve  their  RW  conflicts  with 
update  transactions  can  be  treated  as  update  transactions. 
Therefore,  in  the  rest  of  the  section,  we  use  the  general 
term  transaction  when  we  refer  either  to  a  normal  update 


T2  kigh€r  priority 

Tj  lower  priority 

T2 

younger 

•  T|  reads  before  value.  | 

•  T2  writes.  | 

•  T2  allows  Ti  to  commit  I 

before  it  commits.  | 

•  If  T2  requests  to  commit  I 

Ti  is  (rood)  aborted,  T->  commits.  J 

•  Ti  reads  before  value. 

•  T2  writes. 

•  T2  waits  for  Ti  to  commit 
before  it  commits. 

T2 

oU*r 

1 

•  Ti  is  aborted  (cond).  | 

•  Ti  writes.  | 

1 

•  Ti  reads. 

•  T2  is  aborted. 

Table  3:  W-R  Conflicts 


lower  priority.  In  the  case  that  has  a  higher  priority, 
aborting  T2  would  violate  the  real-time  constraints. 
Therefore,  we  let  Tj  proceed  and  write  a  new  value  for  X 
while  Ty  is  aborted,  since  it  has  seen  a  value  of  X  that  has 
already  become  obsolete. 

(3)W-WConflicL 

Transaction  Tj  requests  to  write  data  object  X  for 
which  transaction  Tjtm  already  issued  a  write  request 
Table  4  shows  the  various  resolution  policies. 

If  Tj  is  younger  than  Tj,  then  Tj  should  wait  for  the 
termination  of  Tj  before  it  writes  a  new  value  for  data 
objea  X.  Such  conflict  resolution  favors  Tj  and  is  com¬ 
patible  with  the  situation  where  has  lower  priority 
than  T /.  However,  when  72  has  a  higher  priority,  it  is  not 
required  to  wait  for  the  lower  i^ority  transaction  Tj. 
Hence,  T2  will  proceed,  and  Tj  will  be  conditionally 
abtxted  in  order  for  the  daiabare  to  remain  internally 
consistent 

If  72  is  older  than  Tj,  then  T2  should  be  aboned. 
Note  that  we  are  interested  only  in  the  most  recent  value 
of  X,  i.e.  the  value  written  by  the  younger  7 y  transaction. 
In  the  case  that  72  has  a  lower  priority,  the  above  resolu¬ 


tion  is  acceptable,  since  the  higher  priority  7 /  is  favored 
to  proceed.  On  the  contrary,  when  T2  has  the  higher  pri¬ 
ority,  T2  must  be  allowed  to  write  its  own  new  value  of 
X.  and  7 y  must  be  conditionally  aborted  for  the  database 
to  remain  cc...  istent  with  respect  U)  the  data  object  X. 

5.3  Commitment 

The  coordinator  of  a  transaction  decides  to  commit 
when  the  following  conditions  are  satisfied: 

•  The  transaction  must  not  have  missed  its  deadline; 

•  Each  daia-object  in  the  read-set  of  the  transaction 
is  read; 

•  All  the  token-sites  of  each  data  object  in  the  write- 
set  of  the  transaction  have  precommitted  (this  only 
applies  to  update  uansactions); 

•  There  is  no  active  transaction  that  has  seen  before¬ 
value  of  any  data  object  in  the  transaction’s  write-set  In 
other  words,  the  before-Ust  of  the  transaction  must  be 
empty  (this  only  applies  to  update  tratuactions). 


Table  4:  W-W  Conflicts 


6.  Concluding  Remarks 
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which  query  transactions  have  to  comply.  Instead  of 
applying  ISR  to  all  transactions.  ISR  is  applied  only  to 
updates,  and  queries  are  left  firee  to  be  interleaved  with 
updates  in  a  more  flexible  way. 

By  relaxing  the  consistency  criteria  for  query  trans¬ 
actions,  queries  and  updates  hardly  ever  have  to  abort  or 
block  each  other  due  to  conflicu  between  them.  As  an 
immediate  consequence  of  this,  more  transactions  may 
terminate  successfully  before  their  deadlines  expire. 
Additionally,  the  second  mechanism  further  improves 
performance;  updating  the  different  replicas  of  the  same 
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transactions  are  free  to  proceed  to  the  next  step  of  their 
execution  or  even  to  commit  Internal  database  consis¬ 
tency  is  preserved  strictly.  Data  returned  by  certain  que¬ 
ries  are  allowed  to  exhibit  limited  inconsistency,  under 
user  control. 

Another  advantage  of  our  scheme  lies  in  the  fact 
that  there  is  very  little  information  the  user  has  to  provide 
to  achieve  efficient  system  operation.  No  a  priori  knowl¬ 
edge  of  the  kind  or  die  number  of  the  data  objects  that  are 
included  in  the  read-set  or  the  write-set  of  a  transaction 
is  needed.  The  only  information  required  is  the  kind  of 
each  submitted  transaction  (query  or  update),  and  the 
expected  average  number  of  objects  accessed  by  each 
transaction.  Moreover,  no  execution  time  estimate  is 
required  for  each  submitted  transaction.  It  would  be 
extremely  difficult  to  compute  a  run-time  estimate,  espe¬ 
cially  in  the  distributed  environments  for  which  our 
scheme  is  designed. 

There  is  a  price  to  pay  for  relaxing  correctness  crite¬ 
ria  and  meeting  more  deadlines.  Although  the  user  can 
control  the  maximum  permissible  inconsistency  of  que¬ 
ries,  one  cannot  know  exactly  which  one  transaction  out 
of  the  set  of  all  possibly  inconsistent  queries  will  return 
incorrect  data,  unless  a  tight  inconsistency  bound  is  pro¬ 
vided.  Note  that  an  overlap  counter  greater  than  zero 
does  not  necessarily  mean  that  the  respective  query 
transaction  is  inconsistent  It  simply  indicates  that  certain 
RW/WR  conflicts  were  passed  unresolved,  and  inconsis¬ 
tency  might  be  present  among  the  data  values  returned. 
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Compared  with  traditional  databases,  real-time  database  systems 
have  a  distinct  feature:  they  must  satisfy  timing  constraints  asso¬ 
ciated  with  transactions.  This  requires  that  transactions  in  real- 
k  lime  database  systems  should  be  scheduled  to  consider  both  data 

,  consistency  and  timing  constraints.  The  paper  addresses  the 

j  issues  associated  with  transaction  scheduling  and  concurrency 
t,  control  in  real-time  database  systems.  As  a  specific  example  of 
real-time  transaction  scheduling,  a  priority-based  scheduling 
algorithm  is  discussed,  together  with  a  performance  study  using  a 
database  prototyping  environment. 

real-time  systems,  databases,  prototyping,  synchronisation,  tran¬ 
saction.  priority 


As  computers  are  becoming  an  essential  part  of  real-time 
systems,  real-time  computing  is  emerging  as  an  import¬ 
ant  discipline  in  computer  science  and  engineering'.  The 
growing  importance  of  real-time  computing  in  a  large 
number  of  applications,  such  as  aerospace  and  defence 
systems,  industrial  automation  and  robotics,  and  nuclear 
power  plants,  has  resulted  in  increased  research  in  this 
area.  Researchers  working  in  the  real-time  systems  area 
have  found  that  traditional  data  models  are  not  adequate 
for  real-time  systems.  In  recent  workshops,  the  need  for 
more  active  research  in  database  systems  that  satisfy 
timing  constraints  in  collecting,  updating,  and  retrieving 
shared  data  has  been  pointed  out-^  Most  database 
systems  are  not  designed  for  real-time  applications  and 
^  lack  the  features  r^uired  to  support  real-time  transac- 
^  tions.  Few  conventional  database  systems  allow  users  to 
I  specify  timing  constraints  or  ensure  that  the  system 
i  meets  those  set  by  the  user.  Interest  in  this  new  appli¬ 
cation  domain  is  also  growing  in  the  database  commun¬ 
ity.  Recently,  a  number  of  research  results  has  appeared 
in  the  literature'-'^. 

Real-time  database  systems  have  (at  least  some)  tran¬ 
sactions  with  explicit  timing  constraints.  Typically,  a 
timing  constraint  is  expressed  in  the  form  of  a  deadline,  a 
certain  time  in  the  future  by  which  a  transaction  needs  to 
be  completed.  A  deadline  is  said  to  be  ‘hard’  if  it  cannot 
be  missed  or  else  the  result  is  useless.  If  a  deadline  can  be 
missed,  it  is  a  ‘soft’  deadline.  With  soft  deadlines,  the 
usefulness  of  a  result  may  decrease  after  the  deadline  is 
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missed.  In  real-time  database  systems,  the  correctness  of 
transaction  processing  depends  not  only  on  maintaining 
consistency  constraints  and  producing  correct  results, 
but  also  on  the  time  at  which  a  transaction  is  completed. 
Transactions  must  be  scheduled  in  such  a  way  that  they 
can  be  completed  before  their  corresponding  deadlines 
expire.  For  example,  both  the  update  and  query  on  the 
tracking  data  for  a  missile  must  be  processed  within 
given  deadlines,  satisfying  not  only  database  consistency 
constraints,  but  also  timing  constraints. 

Conventional  database  systems  are  typically  not  used 
in  real-time  applications  due  to  the  inadequacies  of  poor 
performance  and  lack  of  predictability.  They  are 
designed  to  provide  good  average  performance,  while 
possibly  yielding  unacceptable  worst-case  response 
times.  In  addition,  conventional  database  systems  do  not 
schedule  their  transactions  to  meet  response  require¬ 
ments  and  they  commonly  lock  data  tables  to  assure  only 
the  consistency  of  the  database.  Locks  and  time-driven 
scheduling  are  basically  incompatible,  resulting  in  res¬ 
ponse  requirement  failures  when  low-priority  transac¬ 
tions  block  higher-priority  transactions.  New  techniques 
are  required  to  manage  the  consistency  of  real-time  data¬ 
bases,  and  they  should  be  compatible  with  time-driven 
scheduling  and  meet  both  the  required  temporal  con¬ 
straints  and  data  consistency. 

To  address  the  inadequacies  of  current  database 
systems,  the  transaction  scheduler  needs  to  be  able  to 
take  advantage  of  the  semantic  and  timing  information 
associated  with  data  objects  and  transactions.  A  model 
of  real-time  transactions  needs  to  be  developed  that  char¬ 
acterizes  distinctive  features  of  real-time  databases  and 
can  contribute  to  the  improved  responsiveness  of  the 
system.  The  semantic  information  of  the  transactions 
investigated  in  the  modelling  study  can  be  used  to  deve¬ 
lop  efficient  transaction  schedulers'-  '’. 

The  satisfying  of  timing  constraints  while  preserving 
data  consistency  requires  the  concurrency  control  proto¬ 
col  to  accommodate  timeliness  of  transactions  as  well  as 
data  consistency  requirements.  In  real-time  database 
systems,  timeliness  of  a  transaction  is  usually  combined 
with  its  criticality  to  take  the  form  of  the  priority  of  the 
transaction.  Therefore,  proper  management  of  priorities 
and  conflict  resolution  in  real-time  transaction  schedul¬ 
ing  are  essential  for  predictability  and  responsiveness  of 
real-time  database  systems. 

This  paper  addresses  the  issues  associated  with  tran- 
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sacti<Mi  scheduling  and  concurrency  control  in  real-time 
database  systems.  First,  a  priority-based  scheduling 
approach  for  real-time  database  systems  is  introduced. 
As  a  specific  example  of  real-time  transaction  scheduling, 
the  priority-ceiling  protocol  is  discussed,  together  with  a 
performance  study  using  a  database  prototyping 
environment.  Other  issues  in  scheduling  real-time  tran¬ 
sactions  are  also  discussed. 

PRIORITY-BASED  SCHEDULING 

Real-time  databases  are  often  used  in  applications  such 
as  tracking.  Tasks  in  such  applications  consist  of  both 
computing  (signal  processing)  and  database  accessing 
(transactions).  A  task  can  have  multiple  transactions, 
which  consist  of  a  sequence  of  read  and  write  operations 
that  operate  on  the  database.  Each  transaction  will 
follow  the  two-phase  locking  protocol'*,  which  requires  a 
transaction  to  acquire  all  the  locks  before  it  releases  any 
lock.  Once  a  transaction  releases  a  lock,  it  cannot  acquire 
any  new  lock.  A  high-priority  task  will  preempt  the 
execution  of  lower-priority  tasks  unless  it  is  blocked  by 
the  locking  protocol  at  the  database. 

In  a  real-time  database  system,  scheduling  protocols 
must  not  only  maintain  the  consistency  constraints  of  the 
database,  but  also  satisfy  the  timing  requirements  of  the 
transactions  that  access  the  database.  To  satisfy  both  the 
consistency  and  real-time  constraints,  it  is  necessary  to 
integrate  synchronization  protocols  with  real-time  prior¬ 
ity-scheduling  protocols.  Due  to  the  effect  of  blocking  in 
lock-based  synchronization  protocols,  a  direct  appli¬ 
cation  of  a  real-time  scheduling  algorithm  to  transac¬ 
tions  may  result  in  a  condition  known  as  priority  inver¬ 
sion'^.  Priority  inversion  is  said  to  occur  when  a  higher- 
priority  task  is  forced  to  wait  for  the  execution  of  a 
lower-priority  task  for  an  indefinite  period.  When  two 
transactions  attempt  to  access  the  same  data  object,  the 
access  must  be  serialized  to  maintain  consistency.  If  the 
transaction  of  the  higher-priority  task  gains  access  first, 
then  the  proper  priority  order  is  maintained;  however,  if 
the  lower-priority  transaction  gains  access  first  and  then 
the  higher-priority  transaction  requests  access  to  the 
data  object,  this  higher-priority  task  will  be  blocked  until 
the  lower-priority  transaction  completes  its  access  to  the 
data  ob^.  Priority  inversion  is  inevitable  in  transaction 
systems.  To  achieve  a  high  degree  of  schedulability  in 
real-time  applications,  however,  priority  inversion  must 
be  minimized.  This  is  illustrated  by  the  following  exam¬ 
ple. 

Example  1 

Suppose  Ti,  T2,  and  Tj  are  three  transactions  arranged  in 
descending  order  of  priority,  with  T,  having  the  highest 
priority.  Assume  that  T,  and  Tj  access  the  same  data 
object  O,.  Suppose  that  at  time  tx  transaction  T}  obtains  a 
lock  on  O,.  During  the  execution  of  Tj,  the  high-priority 
transaction  Ti  arrives,  preempts  Tj,  and  later  attempts  to 
access  the  object  O,.  Transaction  Tx  will  be  blocked,  as  O, 
is  already  locked.  It  would  be  expected  that  T,,  being  the 


highest-priority  transaction,  will  be  blocked  no  longer 
than  the  time  for  transaction  Tj  to  complete  and  unlock 
O,.  However,  the  duration  of  blocking  may,  in  fact,  be 
unpredictable.  This  is  because  transaction  Tj  can  be 
blocked  by  the  intermediate  priority  transaction  7*2, 
which  does  not  need  to  access  Oj.  The  blocking  of  Ty,  and 
hence  that  of  T,,  will  continue  until  T2  and  any  other 
pending  intermediate-priority-level  transactions  are 
completed. 

The  blocking  duration  in  the  example  above  can  be 
arbitrarily  long.  This  situation  can  be  partially  remedied 
if  transactions  are  not  allowed  to  be  preempted:  how¬ 
ever,  this  solution  is  only  appropriate  for  short  transac¬ 
tions,  because  it  creates  unnecessary  blocking.  For 
instance,  once  a  long  low-priority  transaction  starts 
execution,  a  high-priority  transaction  that  does  not 
require  access  to  the  same  set  of  data  objects  may  be 
needlessly  blocked. 

An  approach  to  this  problem,  based  on  the  notion  of 
priority  inheritance,  has  been  proposed".  The  basic  idea 
of  priority  inheritance  is  that  when  a  transaction  T  of  a 
task  blocks  higher-priority  tasks,  it  executes  at  the  high¬ 
est  priority  of  all  the  transactions  blocked  by  T.  This 
simple  idea  of  priority  inheritance  reduces  the  blocking 
time  of  a  higher-priority  transaction,  by  solving  the 
unbounded  priority  inversion  problem.  In  the  context  of 
preemptive  scheduling,  a  higher-priority  transaction  T 
can  preempt  the  execution  of  lower-priority  transactions 
unless  T  is  blocked  by  the  locking  protocol.  The  priority 
inheritance  rule  states  that  when  a  transaction  blocks  the 
execution  of  higher-priority  transactions,  it  executes  at 
the  highest  priority  of  all  the  transactions  blocked  by  its 
locks.  For  example,  suppose  transaction  T,  is  blocked  by 
Tj.  Then  the  priority-inheritance  protocol  ensures  that  Tj 
will  execute  at  T,’s  priority  until  it  releases  the  lock  on 
the  data  object  Tx  is  blocked  for. 

The  priority  inheritance  alone,  however,  is  inadequate 
because  the  blocking  duration  for  a  transaction,  though 
bounded,  can  still  be  substantial  due  to  the  potential 
chain  of  blocking.  For  instance,  suppose  that  transaction 
r,  needs  to  access  sequentially  objects  O,  and  O2.  Also 
suppose  that  r2  preempts  Tj,  which  has  already  locked 
O2.  Then,  T2  locks  Ox.  Transaction  T,  arrives  at  this 
instant  and  finds  that  the  objects  O,  and  O2  have  been 
locked  by  the  lower-priority  transactions  Tj  and  Tj,  re¬ 
spectively.  As  a  result,  T,  would  be  blocked  for  the 
duration  of  two  transactions,  once  to  wait  for  7*,  to 
release  <?,  and  again  to  wait  for  Ty  to  release  Oj.  Thus  a 
chain  of  blocking  can  be  formed. 

One  idea  for  dealing  with  this  inadequacy  is  to  use  a 
total  priority  ordering  of  active  transactions*.  A  transac¬ 
tion  is  said  to  be  active  if  it  has  started  but  not  yet 
completed  its  execution.  A  transaction  can  be  active  in 
one  of  two  states;  either  executing  or  being  preempted  in 
the  middle  of  its  execution.  The  idea  of  total  priority 
ordering  is  that  the  real-time  locking  protocol  ensures 
that  each  active  transaction  is  executed  at  some  priority 
level,  taking  priority  inheritance  and  read/write  seman¬ 
tics  into  consideration. 
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TOTAL  ORDERING  BY  PRIORITY 
CEILING 

To  ensure  the  total  priority  ordering  of  active  transac¬ 
tions,  three  priority  ceilings  are  defined  for  each  data 
object  in  the  database:  the  write-priority  ceiling,  the 
abMiute-priority  ceiling,  and  the  rw-priority  ceiling.  The 
write-priority  ceiling  of  a  data  object  is  defined  as  the 
priority  of  the  highest-priority  transaction  that  may 
write  into  this  object,  and  the  absolute-priority  ceiling  is 
defined  as  the  priority  of  the  highest-priority  transaction 
that  may  read  or  write  the  data  object.  The  rw-priority 
ceiling  is  set  dynamically.  When  a  data  object  is  write- 
locked,  the  rw-priority  ceiling  of  this  data  object  is  equal 
to  the  absolute  priority  ceiling.  When  it  is  read-locked, 
the  rw-priority  ceiling  of  this  data  object  is  equal  to  the 
write-priority  ceiling. 

The  reason  for  specifying  the  rw-priority  ceiling  differ¬ 
ently  depending  on  the  lock  type  set  on  the  data  object  is 
lock  compatibility.  When  a  data  object  is  write-locked,  it 
cannot  be  read  or  written  by  another  transaction.  To 
ensure  this,  the  rw-priority  ceiling  of  the  data  object  is  set 
to  its  absolute-priority  ceiling.  As  the  absolute-priority 
ceiling  of  a  data  object  is  equal  to  the  priority  of  the 
highest-priority  transaction  that  may  either  read  or  write 
it,  it  prevents  another  task  from  reading  or  writing  until 
the  lock  is  released.  Similarly,  if  it  is  read-locked,  it 
cannot  be  written  by  another  transaction.  To  ensure  this, 
when  a  data  object  is  read-locked  by  a  transaction,  its 
rw-priority  ceiling  is  set  to  its  write-priority  ceiling.  As 
the  write-priority  ceiling  equals  the  priority  of  the  high¬ 
est-priority  transaction  that  may  write  it,  it  prevents 
another  transaction  from  writing  the  data  object. 
According  to  the  rw-priority-ceiling  rule,  the  systems  can 
guarantee  that  a  data  object  can  be  locked  by  a  transac¬ 
tion  r  only  if  T's  priority  is  higher  than  the  priority 
ceiling  of  all  data  objects  currently  locked  by  transac¬ 
tions  other  than  T  in  an  incompatible  mode. 

The  priority-ceiling  protocol  is  premised  on  systems 
with  a  fixed  priority  scheme.  The  protocol  consists  of  two 
mechanisms:  priority  inheritance  and  priority  ceiling. 
The  combination  of  these  two  mechanisms  gives  the 
properties  of  freedom  from  deadlock  and  a  worst-case 
blocking  of  at  most  a  single  lower-priority  transaction. 

When  a  transaction  attempts  to  lock  a  data  object,  the 
transaction's  priority  is  compared  with  the  highest  rw- 
priority  ceiling  of  all  data  objects  currently  locked  by 
other  transactions.  If  the  priority  of  the  transaction  is  not 
higher  than  the  rw-priority  ceiling,  the  access  request  will 
be  denied,  and  the  transaction  will  be  blocked.  In  this 
case,  the  transaction  is  said  to  be  blocked  by  the  transac¬ 
tion  that  holds  the  lock  on  the  data  object  of  the  highest 
rw-priority  ceiling.  Otherwise,  it  is  granted  the  lock.  In 
the  denied  case,  the  priority  inheritance  is  performed  to 
overcome  the  problem  of  uncontrolled  priority  inver¬ 
sion.  For  example,  if  transaction  T  blocks  higher  tran¬ 
sactions,  T  inherits  /*«,  the  highest  priority  of  the  tran¬ 
sactions  blocked  by  T.  Priority  inheritance  is  transitive. 
The  next  example  shows  how  transactions  are  scheduled 
under  the  priority-ceiling  protocol. 


Example  2 

Consider  the  same  situation  as  in  example  I .  According 
to  the  protocol,  the  priority  ceiling  of  O,  is  the  priority  of 
Ti.  When  Tj  tries  to  access  a  data  object,  it  is  blocked 
because  its  priority  is  not  higher  than  the  priority  ceiling 
of  O,.  As  Tj  blocks  T^,  its  priority  is  promoted  to  that  of 
Ti.  When  Tt  requests  0„  it  will  be  blocked,  and  the 
priority  of  Ti  will  be  promoted  to  that  of  T,.  When  Tj 
unblocks  0„  the  priority  of  Tj  resumes  its  original  prior¬ 
ity.  At  that  point,  T  will  preempt  Ty  and  will  lock  O*. 
Therefore,  T  will  be  blocked  only  once  by  Tj  to  access 
regardless  of  the  number  of  data  objects  it  may 
access. 

Using  the  priority-ceiling  protocol,  mutual  deadlock 
of  transactions  cannot  occur  and  each  transaction  can  be 
blocked  by  at  most  one  lower-priority  transaction  until  it 
completes  or  suspends  itself.  A  high-priority  transaction 
can  be  blocked  by  a  low-priority  transaction  in  one  of 
three  cases. 

•  The  first  case  occurs  when  a  high-priority  transaction 
attempts  to  lock  a  data  object  already  locked  by  a  low- 
priority  transaction. 

•  The  second  case  occurs  when  a  medium-priority  tran¬ 
saction  is  blocked  by  a  low-priority  transaction  that 
has  promoted  its  priority  by  inheriting  that  of  a  high- 
priority  transaction.  This  type  of  blocking  is  necessary 
to  avoid  a  situation  in  which  a  high-priority  transac¬ 
tion  is  indirectly  blocked  by  a  medium-priority  tran¬ 
saction. 

•  The  third  type  of  blocking  is  called  ceiling  blocking, 
which  occurs  when  a  transaction  cannot  start  the 
execution  because  its  priority  is  not  higher  than  the 
priority  ceiling  of  the  data  objects  locked  by  other 
active  transactions.  Ceiling  blocking  is  necessary  to 
avoid  deadlock  and  chained  blocking. 

The  total  priority  ordering  of  active  transactions  leads  to 
some  interesting  behaviour.  As  shown  in  example  2,  the 
priority-ceiling  protocol  may  forbid  a  transaction  from 
locking  an  unlocked  data  object.  At  first  sight,  this  seems 
to  introduce  unnecessary  blocking.  However,  this  can  be 
considered  as  the  ‘insurance  premium’  for  preventing 
deadlock  and  achieving  block-at-most-once  property. 

PERFORMANCE  EVALUATION 

The  issues  associated  with  the  idea  of  total  ordering  in 
priority-based  scheduling  protocols  have  been  investi¬ 
gated  using  a  database  prototyping  environment'*.  One 
of  the  critical  issues  related  to  the  total  ordering 
approach  is  its  performance  compared  with  other  design 
alternatives.  In  other  words,  it  is  important  to  figure  out 
what  is  the  actual  cost  for  the  ‘insurance  premium’  of  the 
total-priority-ordering  approach.  The  results  indicate 
that  the  ceiling  protocol  offers  performance  improve¬ 
ment  over  the  two-phase  locking  protocol  (2PL). 

In  the  author’s  experiments,  transactions  are  gener¬ 
ated  and  put  into  the  start-up  queue.  When  a  transaction 
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is  started,  it  leaves  the  start-up  queue  and  enters  the 
ready  queue.  Transactions  in  the  ready  queue  are 
ofdmd  from  the  highest  priority  to  the  lowest  priority. 
The  transactior  vjth  the  highest  priority  is  always 
selected  to  run.  1  current  running  transaction  sends 
requests  to  the  concurrency  controller.  The  transaction 
may  be  blocked  and  placed  in  the  block  queue.  It  may 
also  be  aborted  and  restarted.  In  such  a  case,  it  is  first 
delayed  for  a  certain  amount  of  time  and  then  put  in  the 
ready  queue  again.  When  a  transaction  in  the  block 
queue  is  unblocked,  it  leaves  the  block  queue  and  is 
placed  in  the  ready  queue.  Whenever  a  transaction  enters 
the  ready  queue  and  its  priority  is  higher  than  the  current 
running  transaction,  it  preempts  the  current  running 
transaction. 

When  a  transaction  enters  the  start-up  queue,  it  has 
the  arrival  time,  the  deadline,  the  priority,  the  read  set, 
and  the  write  set  associated  with  it.  The  transaction  inter¬ 
arrival  time  is  a  random  variable  with  exponential  distri¬ 
bution.  The  data  objects  in  the  read  set  and  the  write  set 
are  uniformly  distributed  across  the  entire  database.  A 
transaction  consists  of  a  sequence  of  read  and  write 
operations.  A  read  operation  involves  a  concurrency- 
control  request  to  get  access  permission,  followed  by  a 
disc  input/output  (I/O)  to  read  the  data  object,  followed 
by  a  period  of  central  processing  unit  (CPU)  use  for 
processing  the  data  object.  Write  operations  are  handled 
similarly,  except  for  their  disc  I/O.  A  transaction  can  be 
discarded  at  any  time  if  its  deadline  is  missed.  Therefore, 
the  model  employs  a  hard  deadline  policy. 

Transaction  size  (the  number  of  data  objects  a  transac¬ 
tion  needs  to  access)  has  been  used  as  one  of  the  key 
variables  in  the  experiments.  It  varies  from  a  small  frac¬ 
tion  up  to  a  relatively  large  portion  (10%)  of  the  data¬ 
base,  so  that  conflicts  would  occur  frequently.  The  high 
conflict  rate  allows  synchronization  protocols  to  play  a 
significant  role  in  determining  system  performance.  The 
arrival  rate  was  chosen  so  that  protocols  are  tested  in  a 
heavily  loaded  rather  than  lightly  loaded  system.  For  the 
design  of  real-time  systems,  high-load  situations  must  be 
considered.  Even  though  they  may  not  arise  frequently,  it 
is  desirable  to  have  a  system  that  misses  as  few  deadlines 
as  possible  when  such  peaks  occur.  In  other  words,  when 
a  crisis  occurs  and  the  database  system  is  under  pressure 
is  precisely  when  making  a  few  extra  deadlines  could  be 
most  important*.  The  following  summarizes  the  findings 
briefly  to  illustrate  the  performance  of  the  algorithms. 

In  Figure  I,  the  throughput  of  the  ceiling  protocol  (C), 
2PL  with  priority  mode  (P).  and  2PL  without  priority 
mode  (L),  is  shown  for  transactions  of  different  sizes. 
The  two-phuse  locking  protocol  with  priority  mode  is 
also  called  the  high-priority  protocol*.  In  that  protocol, 
all  data  conflicts  are  resolved  in  favour  of  the  transaction 
with  higher  priority.  When  u  transaction  requests  a  lock 
on  a  data  object  held  by  other  transactions  in  an  incom¬ 
patible  mode,  if  the  requester's  priority  is  higher  than 
that  of  alt  the  lock  holders,  the  holders  are  restarted  and 
the  requester  is  granted  the  lock;  if  the  requester's  prior¬ 
ity  is  lower,  it  waits  for  the  lock  holders  to  release  the 
lock. 


Figure  /.  Transaction  throughput 

When  the  transaction  size  is  small,  there  is  little  lock¬ 
ing  conflict  and  the  problem,  such  as  deadlock  and  prior¬ 
ity  inversion,  has  little  effect  on  the  overall  performance 
of  a  locking  protocol.  On  the  other  hand,  when  transac¬ 
tion  size  becomes  large,  the  probability  of  locking  con¬ 
flict  rises  rapidly.  Hence  it  would  be  expected  that  the 
performance  of  protocols  will  be  dominated  by  their 
abilities  to  handle  locking  conflicts  when  the  transaction 
size  is  large. 

As  illustrated  in  Figure  I.  the  performance  of  the  2PL 
with  or  without  priority  assignments  degrades  very  fast 
when  transaction  size  increases.  On  the  other  hand,  the 
ceiling  protocol  handles  locking  conflicts  well.  The  pro¬ 
tocol  is  free  from  deadlocks  and  exhibits  the  block-at- 
most-once  property.  Hence  it  performs  much  belter  than 
2PL  when  transaction  size  is  large.  This  is  because  in  the 
priority-ceiling  protocol  the  conflict  rate  is  determined 
by  ceiling  blocking  rather  than  by  direct  blocking,  and 
the  frequency  of  ceiling  blocking  is  not  sensitive  to  the 
transaction  size. 

Another  important  performance  statistic  is  the 
percentage  of  deadlines  missed  by  transactions,  as  the 
synchronization  protocol  in  real-time  database  systems 
must  satisfy  the  timing  constraints  of  individual  transac¬ 
tions.  In  the  experiments,  each  transaction's  deadline  is 
set  in  proportion  to  its  size  and  system  workload 
(number  of  transactions),  and  the  transaction  with  the 
earliest  deadline  is  assigned  the  highest  priority.  As 
shown  in  Figure  2.  the  percentage  of  deadlines  missed  by 
transactions  increases  sharply  for  the  2PL  as  the  transac¬ 
tion  size  increases  due  to  its  inability  to  deal  with  dead¬ 
lock  and  to  give  preference  to  transactions  with  shorter 
deadlines.  Two-phase  lock  with  priority  assignment  per¬ 
forms  somewhat  better,  because  the  liming  constraints  of 
transactions  are  considered,  although  the  deadlock  and 
priority-inversion  problems  still  handicap  its  perfor¬ 
mance.  The  ceiling  protocol  has  the  best  relative  perfor¬ 
mance  because  it  addresses  both  the  dcadkK'k  and  prior¬ 
ity-inversion  problems. 
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Figure  2.  Percentage  of  deadline-missing  transactions 

ISSUES  IN  SCHEDULING  REAL-TIME 
TRANSACTIONS 

Deadlines  are  timing  constraints  associated  with  transac¬ 
tions.  In  real-time  database  systems,  scheduling  decisions 
are  often  directly  related  to  whether  the  transactions 
meet  or  miss  their  deadlines.  Scheduling  decisions  must 
be  made  when  the  scheduler  has  to  select  from  among  a 
collection  of  transactions  that  are  ready  to  be  started  or 
when  a  choice  has  to  be  made  between  two  or  more 
transactions  that  are  competing  for  the  same  resources 
(i.e.,  data  objects).  A  decision  to  abort  a  transaction  for  a 
later  restart  may  result  in  the  transaction  missing  its 
deadline. 

In  addition  to  deadlines,  other  kinds  of  timing  con¬ 
straints  are  associated  with  data  as  well  as  transactions  in 
real-time  database  systems.  For  example,  each  sensor 
input  could  be  indexed  by  the  time  at  which  the  sample 
was  taken.  Furthermore,  once  entered  into  the  database, 
data  may  get  old  or  become  out  of  date  if  they  are  not 
updated  within  a  certain  period.  To  quantify  this  notion 
of  *age',  each  datum  is  associated  with  a  valid  interval. 
Data  out  of  the  valid  interval  do  not  represent  the  cur¬ 
rent  state.  The  tinw  associated  with  the  data  is  the  time  at 
which  the  value  is  currently  believed  to  have  been  true. 
The  valid  interval  indicates  the  time  interval  after  the 
most  recent  updating  of  a  data  object  during  which  a 
transaction  may  access  a  data  object  with  100%  degree 
of  accuracy.  What  occurs  when  a  transaction  attempts  to 
access  a  data  object  outside  of  its  valid  interval  is  depen¬ 
dent  on  the  semantics  of  data  objects  and  the  particular 
implementation.  For  some  data  objects,  for  instance, 
reading  it  out  of  its  valid  interval  would  result  in  0% 
accurate  data  values.  In  general,  each  data  object  is  asso¬ 
ciated  with  a  validity  curve  that  represents  its  degree  of 
validity  with  respect  to  the  time  elapsed  after  the  data 
object  was  last  modified.  The  system  can  compute  the 
validity  of  data  objects  at  the  given  time,  provided  it  is 
given  the  time  of  last  modification  and  its  validity  curve. 

A  real-time  transaction  should  include  its  temporal 


consistency  requirement,  which  specifies  the  validity  of 
data  values  accessed  by  the  transactions.  For  example,  if 
the  temporal  consistency  requirement  is  IS,  it  indicates 
that  data  objects  accessed  by  the  transaction  cannot  be 
older  than  IS  time  units  relative  to  the  start  time  of  the 
transaction.  This  temporal  consistency  requirement  can 
be  specified  as  either  hard  or  soft,  just  like  deadlines.  If  it 
is  hard,  an  attempt  to  read  an  invalid  data  object  (out  of 
its  valid  interval)  will  cause  the  transaction  to  abort. 

While  a  deadline  can  be  thought  of  as  providing  a  time 
interval  as  a  constraint  in  the  future,  the  temporal 
consistency  specifies  a  temporal  window  as  a  constraint 
in  the  past.  As  long  as  the  temporal  consistency  require¬ 
ment  of  a  transaction  can  be  satisfied,  the  system  must  be 
able  to  provide  an  answer  using  available  (may  be  not 
up-to-date)  information.  The  answer  may  change  as 
valid  intervals  change  with  time.  In  a  distributed  data¬ 
base  system,  sensor  readings  may  not  be  applied  to  the 
database  at  the  same  time  and  may  not  be  reflected 
consistently  at  the  console  due  to  the  different  delay  in 
processing  and  communication.  A  temporal  data  model 
for  real-time  database  systems  must  therefore  be  able  to 
accommodate  the  information  that  is  partial  and  out  of 
date.  It  should  distinguish  adequately  between  ‘infor¬ 
mation  not  available  at  time  r’  and  ‘information  out  of 
valid  interval  at  time  t’. 

Real-time  systems  require  a  scheduler  that  should 
incorporate  timing  constraints  associated  with  transac¬ 
tions  and  data  objects  in  its  scheduling  decisions.  The 
goals  of  such  a  scheduler  in  real-time  database  systems 
are: 

•  to  minimize  the  number  of  transactions  that  miss  their 

deadlirs 

•  to  ensure  meeting  the  timing  requirements  of  highly 

critical  transactions 

•  to  maximize  the  overall  transaction  accuracy  within 

the  system 

A  simplistic  approach  would  be  solely  to  consider  the 
minimization  of  transaction  loss  due  to  missing  dead¬ 
lines.  This  goal  by  itself,  however,  is  not  sufficient  as  the 
transactions  due  to  their  temporal  requirements  may 
have  different  degrees  of  criticalness  associated  with 
them.  A  scheduler  may  be  able  to  maximize  the  overall 
number  of  tran^ctions  that  meet  their  deadlines  by  suc¬ 
cessfully  scheduling  the  less  critical  transactions  and 
causing  the  highly  critical  transactions  to  miss  their 
deadlines.  The  failure  of  these  highly  critical  transactions 
may  be  too  costly  in  terms  of  endangering  the  safety  of 
the  entire  system.  Therefore,  it  would  be  desirable  for  the 
scheduler  to  make  an  effort  to  ensure  that  the  deadlines 
of  the  highly  critical  transactions  are  met.  In  addition,  as 
transactions  may  require  different  degrees  of  accuracy 
based  on  their  temporal  consistency  requirements  and 
the  validity  intervals  of  data  objects,  the  scheduler  must 
consider  the  overall  transaction  accuracy  within  the 
system  before  making  a  decision  for  or  against  a  transac¬ 
tion. 

An  intuitive  approach  to  achieve  only  the  first  goal 
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would  be  to  assign  the  highest  priority  in  the  system  to 
the  transaction  with  the  smallest  deadline.  As  this  tran¬ 
saction  is  at  the  highest  risk  of  missing  its  deadline,  it 
would  be  favoured  in  the  scheduling  decision.  This 
approach  is  not  acceptable,  however,  because  it  fails  to 
consider  other  important  factors.  For  example,  it  is  poss- 
iUe  that  the  transaction  is  so  close  to  its  deadline  that  it  is 
almost  certain  that  it  would  miss  its  deadline  anyway. 
Making  a  decision  in  favour  of  this  transaction  may  lead 
to  the  competing  transaction  missing  its  deadline,  result¬ 
ing  in  a  poor  performance.  Therefore,  the  scheduler  must 
consider  the  feasibility  of  meeting  the  deadline  of  the 
transaction  in  making  a  decision. 

It  is  proposed  that  the  goals  mentioned  above  may  be 
achieved  by  having  the  scheduler  consider  the  deadlines, 
the  criticalness,  and  the  temporal  consistency  levels  that 
are  associated  with  transactions.  A  set  of  scheduling 
algorithms  that  consider  each  one  of  these  elements  in 
the  scheduling  decisions  has  been  developed.  It  has  been 
shown  that  using  these  algorithms  could  reduce  the 
number  of  deadline-missing  transactions  and  meet  the 
temporal  consistency  requirements'^.  For  real-time  tran¬ 
sactions,  it  is  necessary  to  define  an  appropriate  notion 
of  correctness,  and  investigate  new  techniques  to  guaran¬ 
tee  the  desired  level  of  correctness  while  increasing  the 
performance  of  the  system  by  using  the  semantic  know¬ 
ledge  of  transactions  and  a  temporal  data  model.  A 
multiversion  data  object  is  one  approach  for  exploiting 
the  semantic  information  of  real-time  transactions  and 
temporal  data  models.  In  a  system  with  multiple  versions 
of  data,  each  write  operation  on  a  data  object  produces  a 
new  version  instead  of  overwriting  the  old  version. 
Hence,  for  each  read  operation,  the  system  selects  an 
appropriate  version  to  read,  enjoying  the  flexibility  of 
controlling  the  order  of  read  and  write  operations.  One 
of  the  issues  that  needs  further  study  is  methods  to  spe¬ 
cify  appropriate  correctness  requirements  of  real-time 
transactions  by  their  timing  constraints  and  the  data 
objects  they  need  to  access. 

CONCLUSIONS 

In  real-time  database  systems,  transactions  must  be  sche¬ 
duled  to  meet  their  timing  constraints.  In  addition,  the 
system  should  support  a  predictable  behaviour  such  that 
the  possibility  of  missing  deadlines  of  critical  tasks  could 
be  informed  ahead  of  time,  before  their  deadlines  expire. 
The  priority-ceiling  protocol  is  one  approach  to  achieve 
a  high  degree  of  schedulability  and  system  predictability. 
It  has  been  discussed  that  this  protocol  might  be  appro¬ 
priate  for  real-time  transaction  scheduling  as  it  is  stable 
over  the  wide  range  of  transaction  sizes  and,  compared 
with  the  two-phase  locking  protocol,  it  reduces  the 
number  of  deadline-missing  transactions. 

There  are  many  technical  issues  associated  with  real¬ 
time  transaction  scheduling  that  need  further  investi¬ 
gation.  In  the  priority-ceiling  protocol  and  many  other 
database  scheduling  algorithms,  preemption  in  locking  is 
usually  not  allowed.  To  reduce  the  number  of  deadline¬ 
missing  transactions,  however,  preemption  may  need  to 


be  considered.  The  preemption  decision  in  a  real-time 
database  system  must  be  made  carefully,  and  it  should 
not  necessarily  be  based  only  on  relative  deadlines'^.  This 
is  so  as  preemption  implies  not  only  that  the  work  done 
by  the  preempted  transaction  must  be  undone,  but  also 
that  later  on,  if  restarted,  it  must  redo  the  work.  The 
resultant  delay  and  the  wasted  execution  may  cause  one 
or  both  of  these  transactions,  as  well  as  other  transac¬ 
tions,  to  miss  deadlines. 

Even  though  data  objects  out  of  their  valid  interval  do 
not  represent  the  current  state,  they  might  be  used  for 
approximation.  Methods  to  specify  the  temporal- 
consistency  requirement  of  transactions,  and  to  use  valid 
intervals  of  data  objects  in  determining  the  degree  of 
consistency  of  transactions  that  access  them,  are  relati¬ 
vely  unexplored  and  are  an  important  problem.  Several 
approaches  to  designing  scheduling  algorithms  for  real¬ 
time  transactions  have  been  proposed*-’-*  '^,  but  their 
performance  in  distributed  environments  has  not  been 
studied.  The  author  is  currently  working  on  implement¬ 
ing  scheduling  algorithms  for  distributed  real-time  tran¬ 
sactions,  using  his  prototyping  environment. 
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This  book  includes  much  information 
about  some  rather  unusual  computer- 
aided  software  engineering  (CASE) 
tools,  embedded  in  what  would  appear 
to  be  Lewis'  university  course  in  software 
engineering  generally. 

First,  the  coverage  of  CASE  tools  par¬ 
ticularly:  all  CASE  tools  discussed  run 
on  and,  for  the  most  part,  only  on  Apple 
Macintoshs.  Some  are  commercial  pro¬ 
ducts;  some  are  local  products  of  Oregon 
State  University  at  Corvalis,  USA,  the 
results  of  student  projects  and  appar¬ 
ently  available  from  the  author.  A  few 
tools  are  uncredited,  but  obviously  exist 
because  they,  like  most  others  in  the  text, 
are  discussed  to  the  accompaniment  of 
endless  illustrations  of  their  many  menus 
and  output  screens. 

In  several  cases  this  merely  underlines 
the  difficulty  of  using  necessarily  linear 
text  descriptions  of  highly  nonlinear  con¬ 
cepts.  In  all  cases  the  amount  of  detail 
devoted  to  'which  mouse  button  to  push' 
seems  out  of  place  in  what  is  not  a  user 
manual  but  a  textbook. 

Second,  the  coverage  of  software  engi¬ 
neering  in  general:  the  best  I  can  say  is 
‘uneven’,  or  perhaps  'idiosyncratic'; 
important  issues  are  left  out  or  glossed 
over,  others  are  laboured  to  death.  In  the 
‘left-out’  category  I  would  put  at  the  top 
of  the  list  careful  discussion  of  the  differ¬ 
ence  between  specification  and  design 
models.  There  is  no  discussion  of  ‘imple¬ 
mentation-free'  specification  models,  of 
the  concept  of  domain  modelling,  or  of 
how  to  move  from  a  formal  problem 
description  to  a  design. 

The  rather  cursory  discussion  of  data¬ 
flow  modelling  merely  reinforces  this 
lack  of  distinction.  For  instance,  what  is 
meant  to  be  a  specification  dataflow 
diagram  (for  a  problem  that  is  about  cost 
estimation)  includes  stores  with  labels 
such  as  'RAM',  'file',  and  'printer'. 

Even  more  fundamentally.  Lewis 
seems  to  fed  that  a  method  without  a 
CASE  tool  is  a  fish  without  a  bicycle  — 
that  it  will  not  get  very  far.  I  was  sur¬ 
prised.  Many  times  I  have  clarified  my 
thinking  about  a  complex  subjcctmattcr 
by  drawing  a  simple  data  model  on  the 
l»ck  of  a  fish.  I've  used  dataflow 
diagrams  to  help  discuss  problems  with 
fish.  I  make  sense  of  communications 


protocols  by  turning  them  into  state 
models.  I  have  borrowed  ‘how  to  start' 
heuristics  from  Yourdon,  Ward,  or 
Shlaer  and  Mellor  when  I  am  beginning  a 
model  of  a  problem.  Methods,  and  their 
component  notations,  are  tools  for 
thought.  Many  companies  have  bought 
CASE  tools,  but  not  taught  their 
employees  how  to  think  with  the  ideas 
behind  them.  Result:  disaster.  For  years, 
electronics  engineers  successfully  used 
finite-state  machine  notations  to  model 
complex  digital  problems  without  the 
help  of  computer-aided  engineering. 
They  may  have  occasionally  made  mis¬ 
takes.  but  their  digital  design  work  could 
not  have  been  done  without  the  disci¬ 
pline  of  paper-and-pencil  work  with 
decision  tables  and  state  models. 
'Computer-aided'  is  not  a  synonym  for 
'useful'. 

Other  highly  contentious  and  1  think 
wrong  claims  are  made.  For  instance. 
‘There  are  no  guide-lines  for  combining 
methods  when  appropriate'*.  Or  'None 
of  the  methods  guarantee  incremental 
correctness  of  design'  ('guarantee', 
maybe  not,  but  'help  verify'  or  'assist  in', 
certainly). 

There  is  some  confusion  between 
methods,  notations  advocated  by  them, 
and  tools  used  to  expedite  the  use  of 
either.  A  naive  student  might  assume  for 
instance  that  data  dictionaries  and  the 
use  of  a  data  composition  notation  is  a 
feature  of  Anatool.  as  opposed  to  a  reali¬ 
sation  of  a  concept  common  to  many 
methods. 

The  coverage  of  many  major  methods 
and  notations  is  half-hearted.  Any  book 
that  includes  dataflow  diagrams  with  no 
arrows  on  any  of  the  data  flows  is  proba¬ 
bly  not  taking  the  concept  behind  the 
diagrams  very  seriously.  There  is  vir¬ 
tually  no  coverage  of  data-modclling/ 
entity-relationship-diagram  notations 
and  the  methods  that  use  them.  The  few 
pages  devoted  to  them  use  the  eccentric 
term  ‘entity  category  relation',  which 
(notation?  method?)  is  introduced  with¬ 
out  discussion  or  definition.  The  reader  is 
given  the  impression  that  such  notations 
and  the  models  built  with  them  arc  only, 
or  mostly,  to  do  with  the  physical  design 
of  databases. 

‘Object  oriented'  is  interpreted  as 
encapsulation,  and  as  a  design  issue. 
There  is  no  coverage  or  mention  of 
object-oriented  or  domain  analysis. 
Inheritance  is  only  mentioned  once  or 
twice,  and  then  not  helpfully;  ‘subtype/ 
supertype’  appears  to  be  equated  with 


‘instance/class'. 

Real-time  issues  are  relegated  to  one 
two-paragraph  programmatic  sidebar. 
There  is  little  coverage  of  methods  or 
notations  used  for  real-time  analysis  and 
design.  State  models  are  not  discussed  or 
taught,  but  only  briefly  mentioned,  and 
then  only  as  a  notation  for  communicat¬ 
ing  with  a  user-interface  design  tool. 
Jackson  Structured  Programming  (JSP) 
is  discussed,  but  not  Jackson  Structured 
Design  (JSD).  Just  to  keep  things  inter¬ 
esting.  JSP  is  called  JSD  in  the  text. 

Although  the  preface  claims  a  reader- 
ship  of  ‘practitioners  who  manage, 
design,  code,  test  and  market  . . .'  soft¬ 
ware,  most  of  them  will  probably  fall 
asleep  over  the  chapters  on  mathematical 
verification  and  software  metrics.  The 
former  is  fuzzily  written  and  too  shallow 
to  teach  a  beginner  anything  useful.  The 
latter  leads  the  eager  practitioner  —  as 
software  metrics  often  seem  to  — 
nowhere  much.  Nor  do  the  pages  on  the 
mathematics  of  reliability  statistics  and 
cost  estimates.  The  pages  'adapted  from 
the  IEEE  Standard  for  Configuration 
Management  Plan'  teach  nothing  and 
are  likely  to  induce  terminal  coma. 

Lewis'  heart  would  seem  to  be  in 
design  and  implementation;  his  coverage 
of  lower-CASE  tools  and  the  concepts 
behind  them  is  excellent.  He  makes  good 
use  of  quantitative  data  (I  had  not  before 
run  into  Card's  wonderful  result  suggest¬ 
ing  that  small  modules  are  more  expen¬ 
sive  to  maintain!).  Testing,  programming 
style  and  complexity,  and  coding 
standards  are  covered  well. 

The  text  occasionally  gets  bogged 
down  in  programming  details  of  no 
general  consequence.  There  are  pages 
devoted  to  Macintosh  system  program¬ 
ming,  down  even  to  hex  listings  of  icon 
and  mask  resources. 

Coverage  of  project  management 
issues  is  deeper  than  what  usually  gels 
into  an  introductory  software  engineer¬ 
ing  text,  and  much  more  than  one  would 
expect  from  a  book  purportedly  about 
CASE.  My  main  protest  would  be  the 
use  made  of  Boehm's  spiral  model.  Lewis 
leaves  out  all  mention  of  risk,  and  uses 
the  spiral  model  as  a  mechanism  for  dis¬ 
cussing  rapid  prototyping  using  very- 
high-level  code  generators.  Boehm's  risk- 
driven  model  is  easy  to  understand,  and  I 
believe  really  should  be  discussed  when¬ 
ever  the  concept  of  prototyping  comes 
up. 

The  book  eould  have  benefited  from 
better  produetion.  Graphics  arc  derived 
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1.  Introduction 

A  real-time  database  system  (RTDBS)  differs  from  a  conventional  database  system  because 
in  addition  to  the  consistent^  constraints  of  the  database,  timing  constraints  of  individual 
transaction  need  to  be  satiated.  In  order  to  provide  a  timely  response  for  queries  and  up¬ 
dates  while  mainiaining  the  consistency  of  data,  real-time  cortcurrency  control  should  involve 
efficient  integration  of  ideas  from  both  database  concurrency  control  and  real-time  schedul¬ 
ing,  ^rious  real-time  concurrency  control  protocols  have  been  proposed  which  employ 
either  a  pessimistic  or  an  optimistic  approach  to  concurrency  control. 

In  this  paper,  we  present  two  hybrid  r^-time  concurrency  control  protocols  which  com¬ 
bine  pessimistic  and  optimistic  approaches  to  concurrency  control  in  order  to  control  block¬ 
ing  and  aborting  in  a  more  effective  manner.  One  protocol  is  a  combination  of  optimistic 
concurrency  control  and  locking,  and  the  other  is  a  combination  of  optimistic  concurrency 
control  and  timestamp  ordering. 

2.  Integrated  Real-Tline  Locking  Protocol 

Concurrency  control  protocols  induce  a  serialization  order  among  conflicting  transactions. 
For  a  concurrency  control  protocol  to  accommodate  timing  constraints  of  transactions,  the 
serialization  order  it  produces  should  reflect  the  priority  of  transactions.  However,  this  is 
often  hindered  fay  the  past  execution  history  of  transactions.  A  higher  priority  transaction 
may  have  no  w:y  to  precede  a  lower  priority  uansaction  in  the  serialization  order  due  to 
previous  conflicts.  For  example,  let  Tn  aird  be  two  transactions  with  having  a  higher 
priority.  If  writes  a  data  object  x  before  Th  reads  it,  then  the  serialization  order  between 
Th  and  Ti  is  determined  asT^-*  Th-  Th  can  never  precede  Ti  in  the  serialization  order 
as  long  as  both  reside  in  the  execution  history.  Most  of  the  current  (real-time)  concurrency 
control  protocols  resolve  this  conflict  either  by  blocking  Th  until  Tl  releases  the  writelock 
or  by  aborting  Ti  in  favor  of  the  higher  priority  transaction  Th-  Blocking  of  a  higher  prior¬ 
ity  transaction  due  to  a  lower  priority  transaction  is  contrary  to  the  requirement  of  real¬ 
time  scheduling.  Aborting  is  also  not  desirable  because  it  degrades  the  system  performance 
and  may  lead  to  violations  of  timing  constraints.  Furthermore,  some  aborts  can  be  wasteful 

This  work  was  supported  in  part  by  ONR,  by  NRaD,  by  DOE,  and  by  IBM. 
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when  the  transaction  which  caused  the  abort  is  aborted  due  to  another  conflict.  The  objec¬ 
tive  of  our  first  protocol  is  to  avoid  such  unnecessary  blocking  and  aborting. 

In  this  protocol  called  Integrated  Real-Time  Locking,  a  priority-dependent  locking  pro¬ 
tocol  is  used  to  adjust  the  serialization  order  of  active  transactions  dynamically.  Its  goal 
is  to  execute  high  priority  transactions  Hrst  so  that  they  are  not  blocked  1^  unconunitted 
lower  priority  transactions,  while  keeping  lower  priority  transactions  from  being  aborted 
even  in  the  face  of  a  conflict.  This  adjustment  of  the  serialization  order  can  be  considered 
as  a  mechanism  to  support  real-time  scheduling.  ^ 

This  protocol  is  an  integrated  protocol  because  it  uses  different  solutions  for  read/write 
(rw)  and  write/write  (ww)  ^nchronization,  and  integrates  the  solutions  of  the  two  subprob-  \ 
lems  to  yield  a  solution  to  the  entire  problem  (Bernstein,  Hadzilacos  and  Goodman  1987). 

The  protocol  is  similar  to  optimistic  concurrency  control  (OCC)  in  the  sense  that  each 
transaction  has  three  phases,  but  unlike  the  optimistic  method,  there  is  no  validation  phase. 

This  protocors  three  phases  are  read,  wait,  and  write.  The  read  phase  is  similar  to  that 
of  OCC  wherein  a  transaction  reads  from  the  database  and  writes  to  its  local  workspace. 

In  this  phase,  however,  conflicts  are  also  resolved  by  using  transaction  priority.  While  other 
optimistic  realtime  concurrency  control  protocols  resolve  conflicts  in  the  validation  phase, 
this  protocol  resolves  them  in  the  read  phase.  In  the  wait  phase,  a  transaction  waits  for 
its  chance  to  commit.  Finally,  in  the  write  phase,  updates  are  made  permanent  to  the  database. 


2.1.  Read  Phase 

The  read  phase  is  the  normal  execution  of  a  transaction  except  that  all  writes  are  on  private 
data  copies  in  the  local  workspace  of  the  transaction  instead  of  on  data  objects  in  the  data¬ 
base.  Such  write  operations  are  called  prewrites.  The  prewrites  are  useful  when  a  transac¬ 
tion  is  aborted,  in  which  case  the  data  in  the  local  workspace  is  simply  discarded.  No  roll¬ 
back  is  req.iired. 

In  this  phase  read-prewrite  and  prewrite-read  conflicts  are  resolved  using  a  priority  based 
locking  protocol.  A  transaction  must  obtain  the  corresponding  lock  before  it  reads  or  pre¬ 
writes.  According  to  the  priority  locking  protocol,  higher  priority  transactions  must  com¬ 
plete  before  a  high-priority  transaction,  it  is  required  to  wait  until  it  is  sure  that  its  commit¬ 
ment  will  not  lead  to  the  higher  priority  transaction  being  aborted. 

Suppose  Th  and  are  two  active  transactiqns  and  has  higher  priority  than  T/.,  there 
are  four  possible  conflicts  as  follows. 

(1)  followed  by  pwj-Jx],  The  resulting  serialization  order  is  Th  -*  T^,  hence 
satisfies  the  priority  order,  and  does  not  need  to  adjust  the  serialization  order. 

(2)  pwjjix]  followed  by  rr^W-  T'w  different  serialization  orders  can  be  induced  with  this 
conflict;  Ti  -*  Th  with  immediate  reading,  and  Th  -*  Ti  with  delayed  reading.  Cer¬ 
tainly,  the  latter  should  be  chosen  for  priority  scheduling.  The  delayed  reading  in  this 
protocol  means  blocking  of  rrj^x]  by  the  writelock  of  Th  on  x. 

(3)  rf^[x]  followed  ty  pwj^lA-  The  resulting  serialization  order  is  Tl  -  Th,  which 
violates  the  priority  order.  If  Ti  is  in  read  phase,  abort  Tl-  If  Ti  is  in  its  wait  phase, 
avoid  aborting  T^  until  Th  commits  in  the  hope  that  Tl  gets  a  chance  to  commit  before 
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Th  does.  If  Th  commits,  Ti  is  aborted.  But  if  Th  is  aborted  by  some  other  conflicting 
transaction,  then  Ti  is  committed.  With  this  policy,  we  can  avoid  unnecessary  and 
useless  aborts,  while  satisfying  priority  scheduling. 

(4)  pwj^x]  followed  Two  different  serialization  orders  can  be  induced  from  this 

conhict;  with  immediate  reading,  and  Ti  -*  Th  with  delayed  reading.  If 

is  in  its  write  phase,  delaying  Th  is  the  only  choice.  This  blocking  is  not  a  serious 
problem  for  Th  because  T^  is  expected  to  finish  writing  x  soon.  Th  can  read  x  as  soon 
as  Ti  finishes  writing  x  in  the  database,  not  necessarily  after  T^  completes  the  whole 
write  phase.  If  T^  is  in  its  read  or  wait  phase,  choose  immediate  reading. 

As  transactions  are  being  executed  and  conflicting  operations  occur,  all  the  information 
pertaining  to  the  induced  dependencies  in  the  serialization  order  needs  to  be  retained.  In 
order  to  maintain  this  information,  we  associate  the  following  with  each  transaction;  two 
sets,  beforeutrset  and  after_trset,  and  a  count,  b^ore_cnt.  The  before^rset  (respectively, 
afier_trset)  of  a  transaction  contains  all  the  active  lower  priority  transactions  that  must 
precede  (respectively,  follow)  this  transaction  in  the  serialization  order.  The  before^cnt 
of  a  transaction  is  the  number  of  higher  priority  transactions  that  precede  this  transaction 
in  the  serialization  order.  When  a  conflict  occurs  between  two  transactions,  their  dependency 
is  determined  and  the  values  of  their  before-Jrset,  afier^trset,  and  before^cnt  arc  changed 
accordingly. 

By  summarizing  what  we  discussed  above,  we  define  the  locking  protocol  as  follows: 

LPl .  Transaction  T  requests  a  read  lock  on  data  object  x. 

for  all  transactions  t  with  write^ock(t^)  do 

if  (priority  (t)  >  priority  (1)  or  t  is  in  write  phase)  /*  Case  2,  4*/ 
then  deny  the  lock  and  exit; 
endif 
enddo 

for  all  transactions  t  with  write  lock  (t^)  do  /*Case  4V 
if  r  w  in  b^ore^rsetj  then  abort  t; 
else  if  (t  is  not  in  after^rsetr) 
then 

include  t  in  after^trsetj 
before ^cnt,  :=  before _cnt,  +  /; 

endif 

endif 

enddo 

grant  the  lock', 

LP2.  Transaction  T  requests  a  write  lock  on  data  object  x. 

for  all  transactions  t  with  read  lock  (t,x)  do 
if  priority  (t)  >  priority  (T) 
then  /*  Case  1  V 

if  (T  is  not  in  afier^rset,) 
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then 

include  t  in  after_trset,; 
before^cntj  :=  before ^cnt,  +  1; 

endif 

else 

if  r  u  /n  wait  phase  /*  Case  3  */ 

then 

if  (t  is  in  after^trsetr)  \ 

then  abort  t; 

else  'X 

include  t  in  before^rsetj-; 

endif 

else  if  t  is  in  read  phase 
then  abort  t; 
endif 

endif 

endif 
enddo 

grant  the  lock; 

2.2.  Hbit  Phase 

The  wait  phase  allows  a  transaction  to  wait  until  it  can  commit.  A  transaction  in  the  wait 
phase  can  conunit  if  all  transactions  with  higher  priority  that  must  precede  it  in  the  serializa¬ 
tion  order,  are  either  committed  or  aborted.  Since  the  before_cnt  of  a  transaction  keeps 
track  of  the  number  of  such  transactions,  the  transaction  can  conunit  only  if  its  before^att 
becomes  zero.  A  transaction  in  the  wait  phase  may  be  aborted  due  to  two  reasons;  if  a 
higher  priority  transaction  requests  a  conflicting  lock  or  if  a  higher  priority  transaction 
that  must  follow  this  transaction  in  the  serialization  order  commits.  Once  a  transaction 
in  its  phase  finds  a  chance  to  commit,  it  commits,  switches  to  its  write  phase  and  releases 
all  readlocks.  The  transaction  is  assigned  a  final  timestamp  which  is  the  absolute  serializa¬ 
tion  order. 


2.3.  Write  Phase 

Once  a  transaction  is  in  the  write  phase,  it  is  considered  to  be  committed.  All  committed 
transactions  can  be  serialized  their  final-timestamp  order.  In  the  write  phase,  the  only 
work  of  a  transaction  is  making  all  its  updates  permanent  in  the  database.  Data  items  in 
local  workspaces  are  copied  into  the  database.  The  write  requests  of  each  transaction  are 
sent  to  the  data  manager,  which  carries  out  the  write  operations  in  the  database.  Transac¬ 
tions  submit  write  requests  along  with  their  final  timestamps.  After  each  write  operation, 
the  corresponding  write  lock  is  released.  In  order  to  resolve  write-write  conflicts  here, 
we  apply  Thomas’  Write  Rule  (TWR)  (Bernstein  et  al.  1987),  which  just  ignores  late  write 
requests  rather  than  aborting  them. 
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3b  Hybrid  Timestamp  Interval  Protocol 
3.1.  Key  Ideas 

This  protocol  is  a  combination  of  OCC  and  timestamp  ordering.  One  serious  problem  of 
OCC  is  that  of  wasted  resources.  Because  data  conflicts  are  detected  and  resolved  only 
during  the  validation  phase,  transactions  can  end  up  aborting  after  having  used  resources 
and  time  for  most  of  the  transaction’s  execution.  The  situation  becomes  even  worse  because 
previously  performed  work  has  to  be  redone  when  the  transaction  is  restarted.  The  problem 
of  the  wasted  resources  and  time  becomes  even  more  serious  for  real-time  transaction 
scheduling,  because  it  reduces  the  chances  of  meeting  the  deadlines  of  transactions. 

Another  problem  of  OCC  is  unnecessary  aborts.  When  a  transaction  is  ready  to  commit, 
it  is  checked  whether  this  transaction  is  involved  in  any  nonseriaiizable  execution.  This 
validation  test  is  usually  conducted  based  on  the  read  sets  and  write  sets  of  transactions, 
rather  than  on  actual  execution  order.  Hence  sometimes  the  validation  process  using  the 
read  sets  and  write  sets  erroneously  concludes  that  a  nonseriaiizable  execution  has  occurred, 
even  though  it  has  not  in  actual  execution.  The  problem  of  unnecessary  aborts  is  serious 
because  it  results  in  a  waste  of  resources  and  time. 

The  problem  of  wasted  resources  is  partly  remedied  with  forward  validation  scheme, 
because  the  validation  test  is  conducted  against  active  transactions  in  their  read  phase 
(Haritsa,  Car^  and  Livny  1990;  Huang,  Stankovic,  Ramanuitham  and  Towsley  1991).  Early 
detection  and  resolution  of  conflicts  can  reduce  the  wasted  resources  and  time.  Our  pro¬ 
tocol  presented  here  also  utilizes  OCC  with  forward  validation  to  take  the  advantage  of 
the  early  detection  and  resolution  of  nonseriaiizable  executions.  Furthermore,  this  protocol 
employs  the  notion  of  dynamic  timestamp  allocation  (Bayer,  Elhardt,  Heigert  and  Reiser 
1982)  and  dynamic  adjustment  of  serialization  order  using  timestamp  interval  (Boksenbaum, 
Cart,  Ferrie  and  Pons  1987).  With  these,  the  ability  of  early  detection  and  resolution  of 
nonseriaiizable  execution  is  improved,  and  unnecessary  aborts  are  avoided. 

3JJ.  OCC  wUhforwud  vaUdadon.  The  execution  of  each  transaction  in  this  protocol  con¬ 
sists  of  three  phases;  read,  validation,  and  write,  as  in  other  OCC  protocols.  This  protocol 
uses  a  forward  validation  scheme,  rather  than  a  backward  validation  scheme.  As  mentioned 
earlier,  in  forward  validation,  the  validation  test  is  conducted  against  active  transactions 
in  their  read  phase.  When  a  conflict  is  detected,  either  the  validating  transaction  or  the 
conflicting  active  transaction  can  be  aborted.  It  is  this  property  that  makes  OCC  with  for¬ 
ward  validation  flexible  and  allows  it  to  be  easily  combined  with  the  priority  mechanism. 
The  phase-dependent  control  of  OCC  and  the  property  of  forward  validation  scheme  pro¬ 
vide  a  framework  for  the  following  components  of  the  protocol. 

3.1.2.  Categories  of  conflicting  transactions.  Since  this  protocol  uses  forward  validation 
conducted  against  active  transactions,  when  a  validation  test  is  performed  for  a  transaction, 
say  T,.,  active  transactions  in  the  system  can  be  divided  into  several  sets  according  to  their 
execution  history  (with  respect  to  that  of  T„).  First,  the  set  of  the  active  transactions  are 
divided  into  two  sets;  a  conflicting  set,  which  contains  transactions  in  conflict  with  T,,,  and 
a  nonconflicting  set,  which  contains  transactions  not  in  conflict  with  Ty.  The  conflicting 
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set  can  be  further  divided  into  two  sets;  a  Reconcilably  Conflicting  (RC)  set  and  an  Irrec¬ 
oncilably  Conflicting  (IC)  set.  Transactions  in  the  RC  set  are  in  conflict  with  T,.,  but  the 
conflicts  are  reconcilable,  i.e..  serializable.  However,  transactions  in  the  IC  set  are  in  con¬ 
flict  with  T^,  and  the  conflicts  are  irreconcilable,  i.e.,  nonserializable.  The  formal  descrip¬ 
tion  of  the  conditions  to  categorize  these  sets  of  active  transactions  and  the  definitions  of 
the  terms  such  as  reconcilable  conflict  and  irreconcilable  conflict  can  be  found  in  (Son, 
Lee  and  Lin  1992). 

The  RC  transactions  do  not  have  to  be  aborted,  but  their  execution  histories  have  to  be 
adjusted  with  the  timestamp  interval  facility  of  this  protocol.  The  IC  transactions  should 
be  handled  with  priority-based  real-time  conflict  resolution  schemes. 

U.3L  Dynamic  timestamp  allocation.  Another  important  aspect  of  this  protocol  is  dynamic 
timestamp  allocation.  Most  timestamp-based  concurrency  control  protocols  use  a  static 
timestamp  allocation  scheme,  i.e.,  each  transaction  is  assigned  a  timestamp  value  at  its 
startup  time,  and  a  total  ordering  instead  of  a  partial  ordering  is  built  up.  This  total  order¬ 
ing  does  not  reflect  any  actual  conflict.  Hence,  it  is  possible  that  a  transaction  is  aborted 
when  it  requests  its  first  data  access  (Bayer  et  al.  1982).  Besides  the  total  ordering  of  all 
transactions  is  too  restrictive,  and  degrades  the  degree  of  concurrency  considerably.  With 
dynamic  timestamp  allocation,  serialization  order  among  transactions  are  dynamically  con¬ 
structed  on  demand  whenever  actual  conflicts  occur.  Only  the  necessary  partial  ordering 
among  transactions  is  constructed  instead  of  a  total  ordering  from  the  static  timestamp 
allocation. 

This  dynamic  timestamp  allocation  scheme  is  possible,  because  OCC  provides  a  phase- 
dependent  structure  of  transaction  execution.  During  the  read  phase,  a  transaction  gradually 
builds  its  serialization  order  with  respect  to  committed  transactions  on  demand  whenever 
a  conflict  with  such  transactions  occurs.  Only  when  the  transaction  conunits  (after  passing 
the  validation  test),  is  its  permanent  timestamp  order  (i.e.,  the  final  serialization  order) 
determined. 

3J.4.  Dynamic  adjustment  of  serialization  order  with  timestamp  intervals.  The  dynamic 
timestamp  allocation  scheme  is  made  more  efficient  with  a  timestamp  interval  facility 
(Boksenbaum  et  al.  1987).  More  flexibility  to  adjust  serialization  order  can  be  obtained 
using  a  timestamp  interval  (initially,  the  entire  range  of  the  timestamp  space)  assigned  to 
each  transaction  instead  of  single  value  for  the  fimestamp.  The  timestamp  intervals  of  active 
transactions  preserve  the  partial  ordering  constructed  1^  serialization  execution.  The  time- 
stamp  interval  of  each  transaction  is  adjusted  (shrunk)  whenever  the  transaction  reads  or 
writes  a  data  object  to  preserve  the  serialization  order  induced  by  committed  transactions. 
When  the  timestamp  interval  of  a  transaction  shuts  out.  it  means  the  transaction  has  been 
involved  in  a  nonserializable  execution,  and  the  transaction  should  be  restarted.  With  this 
facility,  it  is  possible  to  detect  and  resolve  nonserializable  execution  early  in  read  phase. 

When  a  transaction,  say  Ty  commits  after  its  validation  phase,  the  timestamp  intervals 
of  those  transactions  catagorized  as  reconcilably  conflicting  are  adjusted,  i.e. ,  the  serialization 
order  between  the  validating  transaction  Ty  and  its  RC  transactions  are  determined.  Since 
the  permanent  serialization  order  (final  timestamp)  of  these  active  transactions  is  not  deter¬ 
mined,  all  we  have  to  do  is  determine  the  partial  ordering  between  Ty  and  these  active 
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transactions  adjusting  their  timestamp  intervals.  Therefore  these  transactions  do  not  have 
to  be  aborted  even  though  they  are  in  conflict  with  the  committed  transaction,  i.e.,  unnec¬ 
essary  aborts  are  avoided,  unlike  other  OCC  protocols. 

3J.S.  Real-Time  Conflict  Resolution.  In  order  to  resolve  an  irreconcilable  conflict  means 
a  nonserializable  execution.  As  mentioned,  since  this  protocol  is  based  on  OCC  with  a 
forward  validation  scheme,  either  the  validating  transaction  or  the  conflicting  active  trans¬ 
action  can  be  aborted.  To  determine  which  transaction  to  abort,  we  can  employ  the  follow¬ 
ing  priority-based  conflict  resolution  schemes  (Haritsa  et  al.  1990;  Huang  et  al.  1991). 

•  commit:  When  a  transaction  reaches  the  validation  phase,  it  commits  and  notifies  all 
the  IC  transactions.  These  IC  transactions  are  immediately  restarted. 

•  priority  abort:  When  a  transaction  reaches  its  validation  phase,  it  is  aborted  if  its  priority 
is  less  than  that  of  all  the  IC  transactions.  If  not,  it  commits  and  all  the  IC  transactions 
are  restarted  immediately  as  with  the  commit  scheme. 

•  priority  sacriflce:  When  a  transaction  reaches  its  validation  phase,  it  is  aborted  if  at 
least  one  IC  transaction  has  a  higher  priority  than  the  validating  transaction;  otherwise 
it  commits  and  all  the  IC  transactions  are  restarted  immediately. 

•  priority  wait:  When  a  transaction  reaches  it  validation  phase,  if  its  priority  is  not  the 
highest  among  the  IC  transactions,  it  waits  for  the  IC  transactions  with  higher  priority 
to  complete. 


3.2.  Procedural  Description 

lb  execute  the  proposed  protocol,  the  ^stem  maintains  an  objea  table  and  a  transaction 
table.  The  object  table  entries  maintain  the  following  information: 

RTS:  the  largest  timestamp  of  the  committed  transactions  that  read  the  data  object;  and 
WTS:  the  largest  timestamp  of  the  committed  transactions  that  wrote  the  data  object. 

The  transaction  table  entries  maintain  the  following  information: 

RS(T):  read  set  of  transaction  T;  „ 

WS(T):  write  set  of  transaction  T,  and 
77(7):  timestamp  interval  of  transaction  7 

W;  assume  that  the  write  set  of  a  transaction  is  a  subset  of  its  read  set  and  there  is  no 
blind  write.  In  addition  to  the  timestamp  interval  assigned  to  each  active  transaction,  a 
final  timestamp,  denoted  as  TS{T),  is  assigned  to  each  committed  transaction,  7.  that  has 
passed  the  validation  test. 

The  read,  validation  and  write  phase  of  transaction  execution  with  the  proposed  protocol 
can  be  summarized  as  follows: 

If  7  is  not  aborted  during  the  real-time  conflict  resolution  (if  any),  then  it  is  validated 
and  committed.  The  execution  of  7 should  be  reflected  in  the  serialization  order  ot'  committed 
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transactions.  Thus  a  tinai  timestamp  for  T  should  be  reflected  in  the  serialization  order 
of  committed  transactions.  Thus  a  final  timestamp  for  T  should  be  chosen  such  that  the 
order  induced  by  the  final  timestamp  does  not  destroy  the  serialization  order  constructed 
by  the  already  committed  transactions.  In  bet,  any  timestamp  in  the  range  of  77(T)  satisfies 
this  condition  because  7/(7)  preserves  the  order  induced  all  committed  transactions. 

Hence  ai^  timestamp  from  77(7)  can  be  chosen  for  the  fuial  timestamp.  Then  XTS  and 
fiTS  for  all  data  objects  that  7  accessed  should  be  updated,  if  necessary,  and  finally,  the 
timestamp  intervals  of  all  the  RC  transactions  should  be  adjusted.  ^ 

4.  Conclusions 

Time-critical  scheduling  in  real-time  database  systems  has  two  components:  real-time 
scheduling  and  concurrency  control.  While  both  concurrency  control  and  real-time  schedul¬ 
ing  are  well-developed  and  well-understood,  there  is  only  limited  knowlet^e  about  the  inte¬ 
gration  of  concurrency  control  and  real-time  scheduling.  Though  recently  the  problem  has 
been  studied  actively,  the  proposed  solutions  are  still  at  an  initial  stage.  A  major  source 
of  problems  in  integrating  the  two  is  the  lack  of  coordination  in  the  development.  They 
are  developed  on  different  objectives  and  incompatible  assumptions  (Buchmarm  1989). 

Most  of  the  proposed  work  for  real-time  concurrency  control  employ  a  simple  method 
to  utilize  one  concurrency  control  scheme  such  as  2PL.  TO  and  (XC,  and  to  consider 
the  priority  of  operations  inherited  from  the  timing  constraints  of  transactions  in  operation 
scheduling.  This  method  has  an  inherent  disadvantage  of  being  limited  by  the  concurrency 
control  method  used  as  the  base.  Since  neither  of  pessimistic  nor  optimistic  concurrency 
control  is  satisfiictory  by  itself  for  real-time  scheduling,  this  simple  method  using  only  one 
control  can  hardly  satisfy  the  timing  requirements  of  RTDBS.  Problems  such  as  excessive 
blocking,  wasted  restarts,  and  priority  inversion  ate  serious  in  RTDBS. 

In  this  paper,  we  proposed  two  real-time  transaction  scheduling  protocols  which  employ 
a  hybrid  approach,  i.e.,  a  combination  of  both  pessimistic  and  qitimistic  approaches.  These 
protocols  make  use  of  a  new  conflict  resolution  scheme  called  dynamic  adjustment  of  serial¬ 
ization  order,  which  supports  priority-driven  scheduling,  and  avoids  unnecessary  aborts. 
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Abstract.  Real-lime  database  systems  must  maintain  consistency  while  minimizing  the  number  of  transactions 
that  miss  the  deadline.  Tb  satisfy  both  the  consistency  and  real-time  constraints,  there  is  the  need  to  integrate 
qritchionization  protocols  with  real-time  priority  scheduling  protocols.  One  of  the  reasons  for  the  difficulty  in 
developii^  and  evalutuing  database  syncfaioaization  techniques  is  that  it  lakes  a  loi^  time  to  develop  a  system, 
and  evahiauon  is  complicaied  because  it  involves  a  large  number  of  system  parametets  that  nwy  change  dynamically. 
This  paper  describes  an  environment  for  investigating  distributed  real-time  diabase  systems.  The  environment 
is  based  on  a  concurrent  programming  kernel  that  supports  the  creation,  blocking,  and  termination  of  processes, 
as  well  as  scheduling  and  interprocess  communication.  The  coniribuiioo  of  the  paper  is  the  introduction  of  a 
new  approach  to  system  development  that  utilizes  a  module  library  of  reusable  components  to  satisfy  three  major 
goals:  modularny,  fletibiliiy.  and  exiemibildy.  In  addition,  experiments  for  real-time  concurrency  coMioi  techniques 
are  presented  to  illustrue  the  efEectiveness  of  the  environment. 

Key  Wfards;  Distributed  database,  prototyping,  synebrontzatioa,  transaction,  teal-time. 


1.  Introduction 

In  this  psqier,  we  report  our  experiences  with  a  new  ai^roach  to  integrated  development 
and  evaluation  of  real-time  distributed  database  systems,  and  present  experimental  results 
of  various  real-time  synchronization  techniques.  The  goal  of  the  project  is  to  test  the 
hypothesis  that  a  host  environment  can  be  used  to  significantly  accelerate  the  rate  at  which 
we  can  perform  experiments  in  the  areas  of  operating  systems,  databases,  and  network  pro¬ 
tocols  for  real-time  systems.  A  tool  for  developing  components  of  real-time  distributed 
systems  and  integrating  them  to  evaluate  design  alternatives  is  essential  for  the  advance 
of  real-time  computing  technology.  To  the  best  of  our  knowledge,  this  is  the  first  successful 
attempt  to  develop  such  a  tool  as  an  environment  consisting  of  a  hybrid  of  actual  implemen¬ 
tation  and  simulation. 

As  computers  are  becoming  an  essential  part  of  real-time  systems,  real-time  computing 
is  emerging  as  an  important  discipline  in  computer  science  and  engineering  [I].  The  grow¬ 
ing  importance  of  real-time  computing  in  a  large  number  of  applications,  such  as  aerospace 
and  defense  systems,  industrial  automation,  and  nuclear  reactor  control,  has  resulted  in 
an  increased  research  effort  in  this  area.  Researchers  working  on  developing  real-time 
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^Cems  based  on  distributed  system  architecture  have  found  out  that  database  managers 
are  assuming  much  greater  importance  in  real-time  systems.  In  the  recent  workshops, 
developers  of  “real"  real-time  systems  pointed  to  the  need  for  basic  research  in  database 
systems  that  satisfy  timing  constraint  requirements  in  collecting,  updating,  and  retrieving 
shared  data  [2,  3].  Further  evidence  of  its  importance  is  the  recent  growth  of  research  in 
this  field  and  the  announcements  by  some  vendors  of  database  products  that  include  features 
achieving  high  availability  and  predictability  [4]. 

In  addition  to  providing  relational  access  capabilities,  distributed  real-time  database 
systems  offer  a  means  of  loosely  coupling  software  processes,  making  it  easier  to  rapidly 
update  software,  at  least  from  a  functional  perspective.  However,  with  respect  to  time-driven 
scheduling  and  system-timing  predictability,  they  present  new  problems.  One  of  the  char¬ 
acteristics  of  current  database  managers  is  that  they  do  not  schedule  their  transactions  to 
meet  response  requirement  and  they  commonly  lock  data  tables  indiscriminately  to  assure 
database  consistency.  Locks  and  time-driven  scheduling  are  basically  incompatible.  Low- 
priority  transactions  can  and  will  block  higher-priority  transactions  leading  to  response 
requirement  failures.  New  techniques  are  required  to  manage  database  consistency  that  is 
compatible  with  time-driven  scheduling  and  the  essential  system  response  predictability/ 
analyzability  it  brings.  One  of  the  primary  reasons  for  the  difficulty  in  successfully  develop- 
ii^  and  evaluating  new  database  techniques  is  that  it  take  a  long  time  to  develop  a  system, 
and  evaluation  is  complicated  because  it  involves  a  large  number  of  system  parameters  that 
may  change  dynamically. 

A  prototyping  technique  can  be  applied  effectively  to  the  evaluation  of  database  tech¬ 
niques  for  distributed  real-time  ^tems.  In  this  paper,  we  report  our  experiences  with  a 
new  database  prototyping  environment.  It  is  constructed  to  support  research  in  distributed 
database  and  operating  system  technology  for  real-time  applications.  A  database  proto¬ 
typing  environment  is  a  software  package  that  supports  the  investigation  of  the  properties 
of  database  techniques  in  an  environment  other  than  that  of  the  target  database  system. 
The  advantages  of  an  environment  that  provides  prototyping  capability  are  obvious.  First, 
it  is  cost  effective.  If  «(periments  for  a  20-node  distributed  database  system  can  be  ex¬ 
ecuted  in  a  software  environment,  it  is  not  necessary  to  purchase  a  20-node  distributed 
system  thereby  reducing  the  cost  of  evaluating  design  alternatives.  Second,  design  alter¬ 
natives  can  be  evaluated  in  a  uniform  environnnent  with  the  same  system  parameters,  mak¬ 
ing  a  fair  comparison.  Finally,  as  technology  changes,  the  environment  need  only  be  up¬ 
dated  to  provide  researchers  with  the  ability  to  perform  new  experiments. 

A  prototyping  environment  can  reduce  the  time  of  evaluating  new  technologies  and  design 
alternatives.  From  our  past  experience,  we  assume  that  a  relatively  small  portion  of  a  typical 
database  system’s  code  is  affected  by  changes  in  specific  control  mechanisms  whereas  the 
majority  of  code  deals  with  intrinsic  problems,  such  as  file  management.  Thus,  by  prop¬ 
erly  isolating  technology-dependent  portions  of  a  database  ^stem  using  modular  program¬ 
ming  techniques,  we  can  implement  and  evaluate  design  alternatives  very  rapidly.  In  addi¬ 
tion,  a  prototyping  environment  provides  a  friendlier  development  environment  than  a  target 
hardware  system.  The  bare-machine  environment  is  the  worst  possible  place  in  which  to 
explore  new  software  concepts.  For  example,  even  the  recovery  of  the  event  history  leading 
up  to  an  error  in  a  distributed  system  can  be  a  difficult  and,  in  some  cases,  impossible, 
task.  Debugging  is  greatly  facilitated  in  a  prototyping  environment.  The  symbolic  debugger 
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of  our  environment  supports  the  examination  of  an  arbitrary  number  of  execution  threads. 
As  a  result,  the  state  of  a  distributed  computation  can  be  examined  as  a  whole. 

Although  there  exist  tools  for  system  development  and  analysis,  few  prototyping  tools 
exist  for  distributed  dat^ase  experimentation,  especially  for  distributed  real-time  database 
^tems.  Recently,  simulators  have  been  developed  for  investigating  performance  of  several 
concurrency  control  algorithms  for  real-time  applications  [S.  6].  However,  they  do  not  pro¬ 
vide  a  module  hierarchy  composed  from  reusable  components  as  in  out  prototyping 
environment.  Software  developed  in  our  prototyping  environment  will  execute  in  a  given 
taiget  machine  without  modification  of  any  layer  except  the  hardware  interfoce.  In  addi¬ 
tion,  because  our  environment  is  a  hybrid  of  prototyping  and  simulation  (i.e.,  partially 
implemented  and  partially  simulated),  we  can  easily  capture  important  timing  features  of 
the  system,  whereas  it  is  very  hard  using  simulation  only. 

A  database  system  must  operate  in  the  context  of  available  operating  system  services. 
In  other  words,  database  operations  need  to  be  coherent  with  the  operating  system,  because 
correct  functioning  and  timing  behavior  of  database  control  algorithms  depend  on  the  ser¬ 
vices  of  the  underlying  operating  ^stem.  Unless  you  have  a  control  over  the  operating 
system,  investigating  timing  behavior  of  a  database  system  does  not  provide  much  infor¬ 
mation.  An  environment  for  database  systems  development  must,  therefore,  provide  focilities 
to  support  operating  system  functions  and  integrate  them  with  database  systems  for 
experimentation. 

Another  important  use  ttf  a  prolotypii^  environment  is  to  analyze  the  reliability  of  database 
control  mechanisms  and  techniques.  Because  distributed  systems  are  expected  to  work  cor¬ 
rectly  under  various  failure  situations,  the  behavior  of  distributed  database  ^tems  in  degrad¬ 
ed  circumstances  needs  to  be  well  understood.  Although  new  approaches  for  synchroniza¬ 
tion  and  checkpointing  for  distributed  databases  have  been  developed  recently  [7-11],  ex¬ 
perimentation  to  verify  their  properties  and  to  evaluate  their  performance  has  not  been 
performed  due  to  the  lack  of  appropriate  test  tools. 

When  a  database  system  is  developed,  functional  completeness  and  performance  of  the 
q^m  are  of  primary  concern.  The  resulting  ^tems  are  oflen  not  layered  or  modular 
in  their  implementation.  However,  for  experimentation,  a  layered  implementation  approach 
facilitates  the  rapid  evaluation  of  new  techniques.  Such  a  facility  improves  significantly 
the  capability  of  the  system  designer  in  comparing  design  alternatives  in  a  uniform  en¬ 
vironment.  In  this  regard,  the  concept  of  developing  a  methodology  for  Isyered  implemen¬ 
tation  of  the  system  and  building  a  library  modules  with  different  performance/reliability 
characteristics  for  operating  system  and  database  system  functions  seems  promising.  The 
prototyping  environment  we  have  developed  follows  this  approach  [12.  13]. 

The  rest  of  the  paper  is  organized  as  follows.  Section  2  presents  an  informal  description 
of  a  message-based  simulation.  Section  3  describes  the  design  principles  and  the  current 
implementation  of  the  prototyping  environment.  Section  4  presents  experimentations  of 
priority-based  synchronization  algorithms  and  multiversion  data  objects  using  the  prototyping 
environment.  Section  S  concludes  the  paper. 
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2.  Message-Based  Simulation 

When  prototyping  distributed  database  ^'stems,  there  are  two  possible  approaches:  sequential 
programming  and  distributed  programming  based  on  message-passing.  Message-based 
simulations,  in  which  events  are  message-communications,  do  not  provide  additional  ex¬ 
pressive  power  over  standard  simulation  languages;  message-passing  can  be  simulated  in 
many  discrete-event  simulation  languages  including  SIMSCRIPT  [14]  and  GPSS  [15]. 
However,  a  message-based  simulation  can  be  used  as  an  effective  tool  for  developing  a 
distributed  system  because  the  simulation  “looks”  like  a  distributed  program,  whereas  a 
simulation  program  written  in  a  traditional  simulation  language  is  inherently  a  sequential 
program.  Furthermore,  if  a  simulation  program  is  developed  in  a  systematic  way  such  that 
the  principles  of  modularity  and  information  hiding  are  observed,  most  of  the  simulation 
code  can  be  used  in  the  actual  ^stem,  resulting  in  a  reduced  cost  for  system  development 
and  evaluation. 

To  prototype  a  distributed  database  system  on  a  single-host  machine,  it  is  necessary  to 
provide  virtual  machines  for  each  node  of  the  ^stem  being  simulated.  For  that,  the  proc¬ 
ess  view  of  a  ^tem  has  been  adopted.  A  distributed  system  being  simulated  consists  of 
a  number  of  processes  that  interact  with  others  at  discrete  instants  of  time.  Processes  are 
basic  building  blocks  of  a  simulation  program.  A  process  is  an  independent,  dynamic  entity 
that  manipulates  resources  to  achieve  its  objectives.  A  resource  is  a  passive  object  and 
may  be  represented  a  simple  variable  or  a  complex  data  structure.  A  simulation  pro¬ 
gram  models  the  dynamic  behavior  of  processes,  resources,  and  their  interactions  as  they 
evolve  in  time.  Each  physical  operation  of  the  system  is  simulated  by  a  process,  and  the 
process  interactions  are  called  events. 

In  the  literature,  the  notion  of  a  process  has  been  given  numerous  definitions.  The  defini¬ 
tion  used  in  our  model  is  much  the  same  as  that  given  in  [16]:  A  process  is  the  execution 
of  an  interruptible  sequential  program  and  represents  the  unit  of  resource  allocation,  such 
as  die  allocation  of  CPU  time,  main  memory,  and  I/O  devices. 

Wb  use  the  client/server  paradigm  for  process  interaction  in  the  prototyping  environ¬ 
ment.  The  ^stem  consists  of '  of  clients  and  servers,  which  are  processes  that  cooperate 
for  the  purpose  of  transaction  processing.  Each  server  provides  a  service  to  its  clients, 
where  a  client  can  request  a  service  by  sending  a  request  message  (a  message  of  type  re¬ 
quest)  to  the  corresponding  server.  The  composition  structure  of  the  system  to  be  modeled 
can  be  characterized  by  the  way  clients  and  servers  are  mapped  into  processes.  For  exam¬ 
ple,  a  server  might  consist  of  a  fixed  number  of  processes,  each  of  which  may  execute 
requests  from  every  transaction,  or  it  might  consist  of  a  varying  number  of  processes,  each 
of  which  executes  on  behalf  of  exactly  one  transaction. 

Internal  actions  of  a  prot^s,  i.e.,  actions  that  do  not  involve  interactions  with  other  proc¬ 
esses  in  the  system,  are  modeled  either  by  the  passage  of  simulation  time  or  by  the  execu¬ 
tion  of  sequential  statements  within  the  process.  W;  use  a  simulator  clock  to  represent  the 
passage  of  time  in  a  simulation.  The  simulator  clock  advances  in  discrete  steps  where  each 
step  simulates  the  passage  of  time  between  two  events  in  the  system. 

In  a  physical  system,  each  process  makes  independent  progress  in  time  if  the  resources 
they  need  are  available,  and  many  processes  execute  in  parallel.  In  its  simulation,  the  multiple 
processes  of  a  physical  system  must  be  executed  simultaneously  on  one  processor.  This 
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simultaneity  is  achieved  in  the  prototyping  environment  supporting  a  simultaneous  ex¬ 
ecution  of  multiple  processes  in  a  single  address  space. 

A  message-based  prototyping  environment  can  be  of  enormous  benefit  in  designing  and 
testing  emerging  systems,  such  as  real-time  systems,  and  in  comparing  and  improving 
algorithms  that  are  applicable  to  many  different  systems.  One  such  benefit  is  that  the  soft¬ 
ware  to  be  used  in  an  actual  system  can  be  developed  using  the  environment.  The  proto¬ 
typing  environment  can  support  a  simulated  environment,  actual  hardware,  or  a  “hybrid" 
mode  in  which  some  of  the  modules  are  implemented  in  hardware  and  some  are  simulated. 
In  this  way,  it  is  irrelevant  to  the  software  developer  using  the  environment  whether  or 
not  all  or  part  of  the  software  is  running  on  hardware.  When  the  system  is  running  in  a 
hybrid  mode,  the  virtual  clock  used  for  performance  measurement  is  updated  by  the  actual 
time  used  for  direct  execution,  making  performance  measurements  correct. 


3l  Structure  of  the  Prototyping  Environment 

The  prototyping  environment  is  designed  to  facilitate  ea^  extensions  and  modifications. 
Server  processes  can  be  created,  relocated,  and  new  implementations  of  server  processes 
can  be  dynamically  substituted.  The  prototyping  environment  efficiently  supports  a  spec¬ 
trum  of  real-time  database  functions  at  the  operating  level  and  focilitates  the  construction 
of  multiple  database  ^tems  with  different  characteristics.  For  experimentation,  system 
functionality  can  be  adjusted  according  to  application-dependent  requirements  without  much 
overhead  for  a  new  system  setup.  Because  one  of  the  design  goals  of  the  prototyping  envi¬ 
ronment  is  to  conduct  an  empirical  evaluation  of  the  design  and  implementation  of  real¬ 
time  distributed  database  ^tems,  it  has  built-in  support  for  performance  measurement 
of  both  elapsed  time  and  blocked  time  for  each  transaction. 

The  prototyping  environment  provides  support  for  transaction  processing,  including 
transparency  to  concurrent  access,  data  distribution,  and  atomicity.  An  instance  of  the  pro¬ 
totyping  environment  can  manage  any  number  of  virtual  sites  specified  1^  the  user.  Modules 
that  implement  transaction  processing  are  decomposed  into  several  server  processes,  and 
they  communicate  among  themselves  through  ports.  The  clean  interface  between  server 
processes  simplifies  incorporating  new  algorithms  and  facilities  into  the  prototyping  en¬ 
vironment  or  testing  alternate  implementatiojis  of  algorithms.  To  permit  concurrent  tran¬ 
sactions  on  a  single  site,  there  is  a  separate  process  for  each  transaction  that  coordinates 
with  other  server  processes. 

Figure  1  illustrates  the  structure  of  the  prototyping  environment.  The  prototyping  en¬ 
vironment  is  based  on  a  concurrent  programming  kernel,  called  the  StarLitc  kernel.  The 
StarLite  kernel  supports  process  control  to  create,  ready,  block,  and  terminate  processes. 
It  also  supports  the  semaphore  abstraction  to  be  used  higher-level  modules  in  resource 
control,  critical  section  implementation,  and  synchronous  message  passing.  The  internal 
structure  of  the  kernel  follows  the  well-known  client-server  model  [17],  in  which  most  of 
the  operating  system  operates  as  server  processes  in  the  same  address  space  as  client  proc¬ 
esses,  with  the  kernel  merely  handling  message  communication  between  various  processes. 
Figure  2  shows  an  instance  of  this  model.  This  structure  is  particularly  useful  for  extens¬ 
ible  systems  such  as  our  prototyping  environment,  as  additional  or  alternative  functionality 
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can  easily  be  provided  by  creating  a  new  server,  instead  of  changing  and  recompiling  the 
kernel. 

Scheduler  in  the  kernel  maintains  a  virtual  clock  and  provides  the  hold  primitive  to  con¬ 
trol  the  passage  of  time.  The  benefit  of  a  virtual  clock  is  that  any  number  of  performance 
monitoring  operations  may  be  performed  at  an  instant  of  virtual  time.  If  a  physical  clock 
were  embedded,  the  monitoring  activities  themselves  would  interfere  with  other  system 
activities  and  add  to  the  execution  time,  resulting  in  incorrect  performance  measures. 
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Figure  1.  Structure  of  the  prototyping  environment. 
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The  kernel  also  provides  the  capability  of  isolating  overhead  imposed  each  ^stem 
component.  For  instance,  total  time  at  each  node  can  be  divided  into  CPU  time  and  I/O 
time,  to  determine  the  computation-intensive  and  I/O-intensive  functions  and  investigate 
the  distribution  of  tasks  around  the  system  so  as  to  maximize  parallelism.  The  user  inter¬ 
face  (UI)  is  a  front  end  invoked  when  the  prototyping  environment  begins.  UI  is  menu 
driven,  and  designed  to  be  flexible  in  allowing  users  to  experiment  various  configurations 
with  different  system  parameters.  A  user  can  specify  the  following: 

1 .  System  configuration:  number  of  sites  and  the  number  of  server  processes  at  each  site, 
topology  and  communication  costs 

2.  Database  configuration:  database  at  each  site  with  user-defined  structure,  size,  granularity, 
and  levels  of  replication 

3.  Load  characteristics:  number  of  transactions  to  be  executed,  size  of  their  read-sets  and 
write-sets,  transaction  types  (read-only  or  update)  and  their  priorities,  and  the  mean 
interarrival  time  of  transactions 

4.  Concurrency  control:  locking,  timestamp  ordering,  and  priority  based. 

The  UI  initiates  the  configuration  manager  (CM),  which  initializes  necessary  data  struc¬ 
tures  for  transaction  processing  based  on  user  specification.  The  database  at  each  site  con¬ 
sists  of  different  number  of  files,  and  each  file  consists  of  different  number  of  records. 
The  database  structure  can  be  made  complicated  if  necessary.  However,  we  use  a  simple 
file  access  because  investigating  ^chronization  problems  does  not  require  complex  database 
structures. 

The  CM  invokes  the  transaction  generator  at  an  appropriate  time  interval  to  generate 
the  next  transaction  to  form  a  Poisson  process  of  transaction  arrival.  The  environment  is 
flexible  enough  to  generate  any  number  of  transactions  with  different  characteristics.  The 
user  can  specify  his  or  her  own  procedure  for  transactions.  At  initialization  time,  the  user- 
specified  procedure  is  converted  ir  to  a  transaction  process.  Furthermore,  the  prototyping 
environment  supports  the  facility  vhat  allows  mixing  system  generated  transactions  with 
user-specified  ones.  It  is  very  desirable  to  have  such  a  capability  as  the  user  can  setup 
any  workload  that  represents  the  situation  to  be  simulated,  with  or  without  system-generated 
background  workload. 

A  transaction  is  distinguished  from  the  other  processes  in  the  system  by  its  behavior. 
To  the  system,  the  only  distinction  between  transactions  and  server  processes  is  the  Port- 
Tags  on  which  each  receives  messages.  When  a  transaction  is  generated,  it  is  assigned  an 
identifier  that  is  unique  among  all  transactions  in  the  system.  Each  transaction  is  also  assigned 
a  globally  unique  timestamp  hidden  within  a  single  module.  The  advantage  of  extracting 
the  definition  and  assignment  of  the  timestamp  from  its  use  is  that  it  provides  a  means 
of  uniquely  assigning  timestamps  that  are  independent  from  any  specific  implementation. 

The  timestamp  assignment  is  closely  related  to  the  clocks  in  the  system.  In  a  sequential 
simulation,  a  single  clock  suffices  to  order  events  in  the  system.  An  event  is  taken  off  the 
event  queue,  and  the  global  clock  is  advanced  to  the  time  required  for  the  event  to  occur. 
Events  are  related  in  time  by  their  relation  to  the  global  clock.  In  prototyping  distributed 
environments,  no  such  global  clock  is  available.  Time  is  referred  to  by  local  clocks,  which 
is  maintained  at  each  site  and  visible  only  to  processes  at  that  site.  Ordering  of  events  in 


terms  of  the  global  time,  therefore,  depends  on  the  proper  synchronization  of  local  clocks. 
In  our  environment,  clocks  are  ^nchronized  by  intersite  communication.  An  intersite 
message  includes  the  clock  value  of  the  sender  site  at  the  time  the  message  is  sent.  If  the 
sum  of  this  clock  value  and  the  propagation  delay  between  the  sites  is  greater  than  the 
clock  value  at  the  receiver  site,  the  receiver  increments  its  clocks  by  the  difference  be¬ 
tween  the  sum  and  its  clock  value.  In  this  way,  all  succeeding  events  at  the  receiver  site 
can  be  said  to  occur  after  the  sending  of  the  message.  This  satisfies  our  intuitive  notion 
of  “happens  before”  relationship  [18]. 

Transaction  execution  consists  of  read  and  write  operations.  Each  read  or  write  opera¬ 
tion  is  preceded  by  an  access  request  sent  to  the  resource  manager,  which  maintains  the 
local  database  at  each  site.  Each  transaction  is  assigned  to  the  transaction  manager  (TM). 
The  TM  issues  service  requests  on  behalf  of  the  transaction  and  reacts  appropriately  to 
the  request  replies.  For  instance,  if  a  transaction  requests  access  to  a  file  and  that  file  is 
locked,  TM  executes  either  blocking  operation  to  wait  until  the  data  object  can  be  assessed, 
or  aborting  procedure,  depending  on  the  situation.  If  granting  access  to  a  resource  will 
produce  a  deadlock,  TM  receives  abort  response  and  aborts  the  transaction.  Transactions 
commit  in  two  phases.  The  first  commit  phase  consists  of  at  least  one  round  of  messages 
to  determine  if  the  transaction  can  be  globally  committed.  Additional  rounds  may  be  used 
to  handle  potential  Allures.  The  second  commit  phase  causes  the  data  objects  to  be  wrinen 
to  the  database  for  successful  transactions.  TM  executes  the  two  commit  phases  to  ensure 
that  a  transaction  commits  or  aborts  globally.  Figure  3  illustrates  a  queueing  model  adopted 
for  transaction  processing. 


Figure  3.  Simulation  model. 
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Transactions  are  generated  and  put  into  the  start-up  queue.  When  a  transaction  is  started, 
it  leaves  the  start-up  and  enters  the  ready  queue.  The  transaction  at  the  top  of  the  queue 
is  selected  to  run.  The  current  running  transaction  sends  requests  to  the  concurrency  con¬ 
troller  (CC)  implemented  in  the  resource  manager.  The  transaction  may  be  blocked  and 
placed  in  the  block  queue.  It  may  also  be  aborted  and  restarted.  In  such  a  case,  it  is  first 
delayed  for  a  certain  amount  of  time  and  then  put  in  the  ready  queue  again.  When  a  trans¬ 
action  in  the  block  queue  is  unblocked,  it  leaves  the  block  queue  and  is  placed  in  the  ready 
queue  again. 

In  prototyping  distributed  database  systems,  a  communication  network  is  an  important 
component  to  be  simulated  because  the  system  performance  depends  heavily  on  the  topology 
and  communication  protocols  used.  However,  in  many  database  simulators,  the  communica¬ 
tion  subsystem  is  either  ignored  or  simplified  by  adding  communication  cost  to  the  trans¬ 
action  processing  time.  Our  prototyping  environment  uses  a  different  approach  by  pro¬ 
viding  a  virtual  communication  network  that  actually  runs  a  layered  communication  pro¬ 
tocol  on  a  network  topology  specified  1^  the  user.  Because  the  communication  module 
is  a  separate  building  block  in  the  prototyping  environment,  the  user  can  change  it  to  simulate 
different  requirements  of  the  application. 

The  message  server  (MS)  is  a  process  listening  on  a  well-known  port  for  messages  from 
remote  sites.  When  a  message  is  sent  to  a  remote  site,  it  is  placed  on  the  message  queue 
of  the  destination  site  and  the  sender  blocks  itself  on  a  private  semaphore  until  the  message 
is  retrieved  by  MS.  If  the  receiving  site  is  not  operational,  a  time-out  mechanism  will  unblock 
the  sender  process.  When  MS  retrieves  a  message,  it  wakes  the  sender  process  and  for¬ 
wards  the  message  to  the  proper  servers  or  TM.  The  prototyping  environment  supports 
both  Ada-style  rendezvous  (synchronous)  as  well  as  asynchronous  message  passing.  Inter¬ 
process  communication  within  a  site  does  not  go  through  the  message  server;  processes 
send  and  receive  messages  directly  through  their  associated  ports.  The  interprocess  com¬ 
munication  structure  is  designed  to  provide  a  simple  and  flexible  interface  to  the  client 
processes  of  the  application  software  independent  of  the  low-level  hardware  configurations. 
It  is  split  into  three  levels  of  hierarchy:  transport,  network  and  physical  layers. 

The  transport  layer  is  the  inter&ce  to  the  application  software,  thus  it  is  designed  to  be 
as  abstract  as  possible  in  order  to  support  different  port  structures  and  various  message 
types.  In  addition,  application  level  processes  need  not  know  the  details  of  the  destination 
device.  The  invariant  built  into  the  design  of  the  interprocess  communication  interface  is 
that  the  application  level  sender  allocates  the  space  for  a  message,  and  the  receiver  deallocates 
it.  Thus,  it  is  irrelevant  whether  or  not  the  sender  and  receiver  share  memory  space,  i.e.. 
whether  or  not  the  physical  layer  on  the  sender’s  side  copies  the  message  into  a  buffer 
and  deallocates  it  at  the  sender’s  site,  and  the  physical  layer  at  the  receiver’s  site  allocates 
space  for  the  message.  This  enables  prototyping  distributed  systems  or  multiproces.sors 
with  no  shared  memory,  as  well  as  multiprocesses  with  shared  memory  space.  When  the 
latter  is  prototyped,  only  addresses  need  to  be  passed  in  messages  without  intermediate 
allocation  and  deallocation. 

The  physical  layer  of  message  passing  simulates  the  physical  sending  and  receiving  of 
bits  over  a  communication  medium,  i.e.,  it  is  for  intersite  message  passing.  The  device 
number  in  the  interface  is  simply  a  cardinal  number,  this  enables  the  implementation  to 
be  simple  and  extensible  enough  to  support  any  application.  To  simulate  sending  or  to 
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actually  send  over  an  Ethernet  in  the  target  system,  for  example,  a  module  could  map  net¬ 
work  addresses  onto  the  cardinal  numbers.  To  send  from  one  processor  to  another  in  a 
distributed  system,  the  cardinals  can  represent  processor  numbers. 

Messages  are  passed  to  specific  processes  at  specific  sites  in  the  network  l^er  of  the  com¬ 
munication  interface.  This  layer  separates  the  transport  and  the  physical  l^ers  so  that  the 
transport-layer  interlace  can  be  processor  and  process  independent  and  the  physical  layer 
interface  need  be  concerned  only  with  the  sending  of  bits  from  one  site  to  another.  The  trans¬ 
port  layer  inter&ce  of  the  communication  subsystem  is  implemented  in  the  transport  module. 
A  transport-level  Send  is  made  to  an  abstraction  called  a  PortTag.  This  abstraction  is 
advantageous  because  the  implementation  (i.e.,  what  a  POrtTag  represents)  is  hidden  in 
the  Ports  module.  Thus  the  PortTag  can  be  mapped  onto  any  port  structure  or  the  reception 
point  of  any  other  message  passing  system.  The  transport-level  Send  operation  builds  a 
packet  consisting  of  the  sender’s  PortTag,  used  for  replies,  the  destination  PortTag,  and 
the  address  of  the  message.  It  then  retrieves  from  the  destination  PortTag  the  destination 
device  number.  If  this  number  is  the  same  as  the  sender’s,  the  Send  is  an  intrasite  message 
communication,  and  hence  the  network-level  Send  is  performed.  Otherwise  the  send  re¬ 
quires  the  physical  module  for  intersite  communication.  Note  that  accesses  to  the  implemen¬ 
tation  details  of  the  PortTag  are  restricted  to  the  module  that  actually  implements  it;  this 
enables  changing  the  implementation  without  recompiling  the  rest  of  the  system. 

The  performance  monitor  interacts  with  the  transaction  managers  to  record,  priority/ 
timestamp  and  read/write  data  set  for  each  transaction,  time  when  each  event  occurred, 
statistics  for  each  transaction,  and  CPU-hold  interval  in  each  node.  The  statistics  for  a 
transaction  includes  arrival  time,  start  time,  total  processing  time,  blocked  interval,  whether 
deadline  was  missed  or  not,  and  the  number  of  aborts. 

Because  each  TM  is  a  separate  process,  each  has  its  own  data  area  in  which  to  keep 
track  of  the  time  when  a  service  request  is  sent  out  and  the  time  the  response  arrives,  as 
well  as  the  time  when  a  transaction  begins  blocking,  waiting  for  a  resource,  and  the  time 
the  resource  is  granted.  When  a  transaction  commits,  it  calls  a  procedure  that  records  the 
above  measures;  when  the  simulation  clock  has  expired,  these  measures  are  printed  out 
for  ail  transactions. 


4.  Prototyping  Real-Time  Database  Systems 

Section  3  described  the  structure  of  the  prototyping  environment  with  some  of  its  advanced 
features.  In  this  section,  we  present  real-time  database  systems  implemented  u.sing  the  pnuo- 
typing  environment.  The  objectives  of  our  study  using  the  prototyping  environment  arc 
(1)  to  evaluate  the  prototyping  environment  itself  in  terms  of  correctness,  functionality,  and 
modularity,  (2)  to  compare  performance  between  two-phase  locking  and  priority-based  syn¬ 
chronization  algorithms  and  between  a  multiversion  database  and  its  corresponding  single¬ 
version  database,  through  the  sensitivity  study  of  key  parameters  that  affect  performance. 

Compared  with  traditional  databases,  real-time  database  systems  have  a  distinct  feature: 
they  must  satisfy  the  timing  constraints  associated  with  transactions.  In  other  words,  “time" 
is  one  of  the  key  factors  to  be  considered  in  real-time  database  systems.  The  timing  con¬ 
straints  of  a  transaction  typically  include  its  ready  time  and  deadline,  as  well  as  temporal 
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consistency  of  the  data  accessed  by  it.  Transactions  must  be  scheduled  in  such  a  way  that 
they  can  be  completed  before  their  corresponding  deadlines  expire.  For  example,  both  the 
update  and  query  on  a  tracking  data  of  a  missile  must  be  processed  within  the  given  deadlines 
otherwise  the  information  provided  could  be  of  little  value.  In  such  a  system,  transaction 
processing  must  satisfy  not  only  the  database  consistency  constraints  but  also  the  timing 
constraints. 

The  prototyping  enviroiunent  we  have  developed  is  especially  useful  for  investigating 
timing  behavior  of  real-time  transactions  as  we  can  control  all  the  system  components.  An 
alternative  to  the  prototyping  approach  is  to  develop  a  system  on  a  bare  machine,  based 
on  a  specialized  real-time  kernel.  The  ARTS  [19]  and  the  RT-CARAT  [20]  systems  take 
this  approach.  Difficulties  with  such  an  approach  are  that  (1)  it  takes  much  more  effort 
to  develop,  (2)  the  system  is  strongly  coupled  with  its  hardware  and  hence  hard  to  change 
its  timing  characteristics  when  needed,  and  (3)  the  system  is  not  portable  as  it  is  implemented 
in  the  target  environment. 


4.1.  Steady-State  Estimation 

In  order  to  show  that  the  results  we  get  from  experiments  represent  the  performance  of 
the  system  in  steady  states,  we  have  performed  experiments  to  check  if  the  system  were 
allowed  to  run  for  any  length  of  time  greater  than  certain  threshold  value,  the  variation 
in  results  would  be  within  some  tolerable  interval.  We  have  implemented  a  well-known 
synchronization  protocol,  two-phase  locking  (2PL),  for  the  following  system  and  workload 
configuration: 

8  sites  with  fiilly  interconnected  network 
multiprogramming  level  of  10 
75%  read-only  and  25%  update  transactions 
read-only  transactions  access  3%  of  the  database 
update  transactions  access  1%  of  the  database 
database  consists  of  500  unreplicated  objects 
Poisson  distribution  of  transaction  arrivals 

Figure  4  shows  the  average  response  time  of  transactions  using  the  2PL.  It  shows  that  the 
average  response  time  begins  to  stabilize  at  3000  simulation  time  units  and  varies  only 
slightly  from  then  on.  The  lower  response  time  up  to  3000  time  units  are  due  to  the  first 
set  of  transactions  that  benefits  from  a  lower  initial  multiprogramming  level  and  potential 
conflicts.  In  addition,  because  transactions  requiring  longer  execution  time  will  increase 
the  average  response  time  when  they  complete,  they  do  not  contribute  to  the  average  response 
time  during  the  early  stage  of  transaction  execution  if  they  were  in  the  initial  group  of  trans¬ 
action.  These  initial  characteristics  are  gradually  erased  from  the  average  performance. 

In  addition,  as  we  increase  the  time  for  experiments,  the  average  response  time  is  deter¬ 
mined  from  an  increasing  number  of  transactions.  For  example,  at  100  time  units,  the  number 
of  transactions  contributing  to  the  mean  is  approximately  12.  At  4000,  it  is  approximately 
60.  Thus  the  overall  behavior  of  the  system  becomes  less  and  less  subject  to  the  behavior 
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Figure  4.  Response-time  stability. 

of  individual  transactions.  From  the  graph  and  characteristics  of  our  environment,  we  con¬ 
cluded  that  an  experiment  must  run  at  least  3500  time  units  before  it  starts  to  capture  the 
steady  state  behavior  of  the  system. 


4.2.  Priority-Based  Synchronization 

Real-time  databases  are  often  used  by  applications  such  as  tracking.  Tasks  in  such  applica¬ 
tions  consist  of  both  computing  (signal  processing)  and  database  accessing  (transactions). 
A  task  can  have  multiple  transactions,  which  consist  of  a  sequence  of  read  and  write  opera¬ 
tions  operating  on  the  database.  Each  transaction  will  follow  the  two-phase  locking  proto¬ 
col,  which  requires  a  transaction  to  acquire  all  the  locks  before  it  releases  any  lock.  Once 
a  transaction  releases  a  lock,  it  cannot  acquire  any  new  lock.  A  high-priority  task  will 
preempt  the  execution  of  lower-priority  tasks  unless  it  is  blocked  by  the  locking  protocol 
at  the  database. 

In  a  real-time  database  system,  synchronization  protocols  must  not  only  maintain  the 
consistency  constraints  of  the  database  but  also  satisfy  the  timing  requirements  of  the  trans¬ 
actions  accessing  the  database.  To  satisfy  both  the  consistency  and  real-time  constraints, 
there  is  a  the  need  to  integrate  synchronization  protocols  with  real-time  priority  schedul¬ 
ing  protocols.  A  major  source  of  problems  in  integrating  the  two  protocols  is  the  lack  of 
coordination  in  the  development  of  synchronization  protocols  and  real-time  priority  schedul¬ 
ing  protocols.  Due  to  the  effect  of  blocking  in  lock-based  synchronization  protocols,  a  direct 
application  of  a  real-time  scheduling  algorithm  to  transactions  may  result  in  a  condition 
known  as  priority  inversion  [6].  Priority  inversion  is  said  to  occur  when  a  higher-priority 
process  is  forced  to  wait  for  the  execution  of  a  lower-priority  process  for  an  indefinite  period 
of  time.  When  the  transactions  of  two  processes  attempt  to  access  the  same  data  object. 


AN  ENVIRONMENT  FOR  INTEGRATED  DEVELOPMENT  AND  EVALUATION 


79 


the  access  must  be  serialized  to  maintain  consistency.  If  the  transaction  of  the  higher-priority 
process  gains  access  flrst,  then  the  prefer  priority  order  is  maintained;  however,  if  the 
transaction  of  the  lower  priority  gains  access  first  and  then  the  higher-priority  transaction 
requests  access  to  the  data  object,  this  higher  priority  process  will  be  blocked  until  the 
lower-priority  transaction  completes  its  access  to  the  data  object.  Priority  inversion  is  in¬ 
evitable  in  transaction  ^sterns.  However,  to  achieve  a  high  degree  of  schedulability  in  real¬ 
time  applications,  priority  inversion  must  be  minimized.  This  is  illustrated  by  the  follow¬ 
ing  example. 

Example;  Suppose  that  Ti,  and  Tj  are  three  transactions  arranged  in  descending 
order  of  priority  with  T\  having  the  highest  priority.  Assume  that  T,  and  access  the 
same  data  object  O,  .  Suppose  that  at  time  T\  transaction  Tj  obtains  a  lock  on  O,  .  During 
the  execution  of  T^,  the  high-priority  transaction  7*1  arrives,  preempts  T^,  and  later  attempts 
to  access  the  object  O,-.  Transaction  T\  will  be  blocked  because  D,  is  already  locked.  We 
would  expect  that  T\,  being  the  highest-priority  transaction,  will  be  blocked  no  longer  than 
the  time  for  transaction  to  complete  and  unlock  However,  the  duration  of  blocking 
m^,  in  fact,  be  unpredictable.  This  is  because  transaction  can  be  blocked  by  the  in¬ 
termediate  priority  ttansaction  T2  that  does  not  need  to  access  Oj.  The  blocking  of  T-^,  and 
hence  that  of  Tj,  will  continue  until  Tj  and  any  other  pending  intermediate  priority  level 
transactions  are  completed. 

The  blocking  duration  in  the  example  above  can  be  arbitrarily  long.  This  situation  can 
be  partially  remedied  if  transactions  are  not  allowed  to  be  preempted;  however,  this  solu¬ 
tion  is  only  appropriate  for  very  short  transactions,  because  it  creates  unnecessary  block¬ 
ing.  For  instance,  once  a  long  low-priority  transaction  starts  execution,  a  high-priority  trans¬ 
action  not  requiring  access  to  the  same  set  of  data  objects  may  be  needlessly  blocked. 

An  approach  to  this  problem,  based  on  the  notion  of  priority  inheritance,  has  been  pro¬ 
posed  [21].  The  basic  idea  of  priority  inheritance  is  that  when  a  transaction  T  of  a  process 
blocks  higher-priority  processes,  it  executes  at  the  highest  priority  of  all  the  transactions 
blocked  Ti.  This  simple  idea  of  priority  inheritance  reduces  the  blocking  time  of  a 
higher-priority  transaction.  However,  this  is  inadequate  because  the  blocking  duration  for  a 
transaction,  although  bounded,  can  still  be  substantial  due  to  the  potential  chain  of  blocking. 
For  instance,  suppose  that  transaction  T)  needs  to  sequentially  access  objects  0|  and  Oi. 
Also  suppose  that  T2  preempts  Ti,  which  has  already  locked  O2.  Then  Ti  locks  (7|. 
Transaction  T)  arrives  at  this  instant  and  finds  that  the  objects  D|  and  O2  have  been  respec¬ 
tively  locked  ty  the  lower-priority  transactions  T2  and  Ty.  As  a  result,  Ti  would  be  blocked 
for  the  duration  of  two  transactions,  once  to  wait  for  T2  to  release  D|  and  again  to  wait 
for  Ti  to  release  O2.  Thus  a  chain  of  blocking  can  be  formed. 

One  idea  for  dealing  with  this  inadequacy  is  to  use  a  total  priority  ordering  of  active 
transactions  [22].  A  transaction  is  said  to  be  active  if  it  has  started  but  not  yet  completed 
its  execution.  A  transaction  can  be  active  in  one  of  two  states:  executing  or  being  preemp¬ 
ted  in  the  middle  of  its  execution.  The  idea  of  total  priority  ordering  is  that  the  real-time 
locking  protocol  ensures  that  each  active  transaction  is  executed  at  some  priority  level, 
taking  priority  inheritance  and  read/write  semantics  into  consideration. 
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4.3.  Total  Ordering  by  Priority  Ceiling 

To  ensure  the  total  priority  ordering  of  active  transactions,  three  priority  ceilings  are  defined 
for  each  data  object  in  the  database:  the  write-priority  ceiling,  the  absolute-priority  ceil¬ 
ing,  and  the  rw-priority  ceiling.  The  write-priority  ceiling  of  a  data  object  is  defined  as 
the  priority  of  the  highest-priority  transaction  that  may  write  into  this  object,  and  absolute- 
priority  ceiling  is  defined  as  the  priority  of  the  highest-priority  transaction  that  may  read 
or  write  the  data  object.  The  rw-priority  ceiling  is  set  dynamically.  When  a  data  object 
is  write  locked,  the  rw-priority  ceiling  of  this  data  object  is  defined  to  be  equal  to  the  ab¬ 
solute  priority  ceiling.  When  it  is  read  locked,  the  rw-priority  ceiling  of  this  data  object 
is  defined  to  be  equal  to  the  write-priority  ceiling.  The  priority  ceiling  protocol  is  prem¬ 
ised  on  systems  with  a  fixed  priority  scheme.  The  protocol  consists  of  two  mechanisms: 
priority  inheritance  and  priority  ceiling.  With  the  combination  of  these  two  mechanisms, 
we  get  the  properties  of  freedom  from  deadlock  and  a  worst  case  blocking  of  at  most  a 
single  lower  priority  transaction. 

When  a  transaction  attempts  to  lock  a  data  object,  the  transaction's  priority  is  compared 
with  the  highest  rw-priority  ceiling  of  all  data  objects  currently  locked  by  other  transac¬ 
tions.  If  the  priority  of  the  transaction  is  not  higher  than  the  rw-priority  ceiling,  the  access 
request  will  be  denied,  and  the  transaction  will  be  blocked.  In  this  case,  the  transaction 
is  said  to  be  blocked  by  the  transaction  that  holds  the  lock  on  the  data  object  of  the  highest 
rw-priority  ceiling.  Otherwise,  it  is  granted  the  lock.  In  the  denied  case,  the  priority  in¬ 
heritance  is  performed  in  order  to  overcome  the  problem  of  uncontrolled  priority  inver¬ 
sion.  For  example,  if  transaction  T  blocks  higher-priority  transactions,  T  inherits  P//,  the 
highest  priority  of  the  transactions  blocked  T. 

Under  this  protocol,  it  is  not  necessary  to  check  for  the  possibility  of  read-write  con¬ 
flicts.  For  instance,  when  a  data  object  is  write  locked  by  a  transaction,  the  rw-priority 
ceiling  is  equal  to  the  highest  priority  transaction  that  can  access  it.  Hence,  the  protocol 
will  block  a  higher  priority  transaction  that  may  write  or  read  it.  On  the  other  hand,  when 
the  data  object  is  read-locked,  the  rw-priority  ceiling  is  equal  to  the  highest  priority  trans¬ 
action  that  may  write  it.  Hence,  a  transaction  that  attempts  to  write  it  will  have  a  priority 
no  higher  than  the  rw-priority  ceiling  and  will  be  blocked.  Only  the  transaction  that  read 
it  and  have  priority  higher  than  the  rw-priority  ceiling  will  be  allowed  to  read  lock  it  as 
read-locks  are  compatible.  Using  the  priority-ceiling  protocol,  mutual  deadlock  of  trans¬ 
actions  cannot  occur  and  each  transaction  can  be  blocked  1^  at  most  one  lower-priority 
transaction  until  it  completes  or  suspends  itself.  The  next  example  shows  how  transactions 
are  scheduled  under  the  priority  ceiling  protocol. 

Example:  Consider  the  same  situation  as  in  the  previous  example.  According  to  the  pro¬ 
tocol,  the  priority  ceiling  of  O,  is  the  priority  of  Ti.  When  Ti  tries  to  access  a  data  ob¬ 
ject,  it  is  blocked  because  its  priority  is  not  higher  than  the  priority  ceiling  of  O,.  Themfore 
Tj  will  be  blocked  only  once  by  Tj  to  access  Oj,  regardless  of  the  number  of  dau  objects 
it  may  access. 

The  total  priority  ordering  of  active  transactions  leads  to  some  interesting  behavior.  As 
shown  in  the  example  above,  the  priority-ceiling  protocol  may  forbid  a  transaction  from 
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locking  an  unlocked  data  object.  At  first  sight,  this  seems  to  introduce  unnecessary  block¬ 
ing.  However,  this  can  be  considered  as  the  “insurance  premium”  for  preventing  deadlock 
and  achieving  block-at-most-once  property. 

Using  the  prototyping  environment,  we  have  investigated  issues  associated  with  this  idea 
of  total  ordering  in  priority-based  scheduling  protocols.  One  of  the  critical  issues  related 
to  the  total  ordering  approach  is  its  performance  compared  with  other  design  alternatives. 
In  other  words,  it  is  important  to  figure  out  what  is  the  actual  cost  for  the  “insurance 
premium”  of  the  total  priority-ordering  approach. 


4.4.  Performance  Evaluation 

>^ous  statistics  have  been  collected  for  comparing  the  performance  of  the  priority-ceiling 
protocol  with  other  synchronization  control  algorithms.  Transaaion  are  generated  with  ex¬ 
ponentially  distributed  interarrival  times,  and  the  data  objects  updated  by  a  transaction  are 
chosen  uniformly  from  the  database.  A  transaction  has  an  execution  profile  that  alternates 
data  access  requests  with  equal  computation  requests  and  some  processing  requirement 
for  termination  (eidier  commit  or  abort).  Thus  ^  mtal  processing  time  of  a  transaction 
is  directly  related  to  the  number  of  data  objects  accessed.  Due  to  space  considerations, 
we  do  not  present  all  our  results  but  have  selected  the  graphs  that  best  illustrate  the  dif¬ 
ference  and  performance  of  the  algorithms.  For  example,  we  have  omitted  the  results  of 
an  experiment  that  varied  the  size  of  the  database,  and  thus  the  number  of  conflicts,  because 
they  only  confirm  and  not  increase  the  knowledge  yielded  l^  othc,  experiments. 

For  each  experiment  and  for  each  algorithm  tested,  we  collected  performance  statistics 
and  averaged  over  the  M)  runs.  The  percentage  of  deadline-missing  transactions  is  calculated 
with  the  foUowing  equation;  %imssed  —  100*  (number  of  deadline-missing  transac¬ 
tions/number  of  transactions  processed).  A  transaction  is  processed  if  either  it  executes 
completely  or  it  is  ^rted.  Vie  assume  that  all  the  transactions  are  hard  in  the  sense  that 
there  will  be  no  value  for  completing  the  transaction  after  its  deadline.  Transactions  that 
miss  the  deadline  are  aborted  and  disappear  from  the  system  immediately  with  some  abort 
cost.  We  have  used  the  transaction  size  (the  number  of  data  objects  a  transaction  needs 
to  access)  as  one  of  the  k^  variables  in  the  experiments.  It  varies  from  a  small  fraction 
up  to  a  relatively  large  portion  (10%)  of  the  database  so  that  conflict  would  occur  frequently. 
The  high  conflict  rate  allows  synchronization  protocols  to  play  a  significant  role  in  the  rystem 
performance.  We  choose  the  arrival  rate  so  that  protocols  are  tested  in  heavily  loaded  rather 
than  lightly  loaded  system.  In  order  to  design  real-time  systems,  one  must  consider  high- 
load  situations.  Even  though  they  may  not  arise  frequently,  one  would  like  to  have  a  system 
that  misses  as  few  deadlines  as  possible  when  such  peaks  occur.  In  other  words,  when 
a  crisis  occurs  and  the  database  system  is  under  pressure  is  precisely  when  making  a  few 
extra  deadlines  could  be  most  important  [S]. 

We  normalize  the  transaction  throughput  in  records  accessed  per  second  for  successful 
transactions,  not  in  transactions  per  second,  in  order  to  account  for  the  fact  that  bigger 
transactions  need  more  database  processing.  The  normalization  rate  is  obtained  by  multiply¬ 
ing  the  transaction  completion  rate  (transactions/second)  by  the  transaction  size  (database 
records  accessed/transaction). 
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In  Figure  5,  the  throughput  of  the  priority-ceiling  protocol  (C),  the  two-phase  locking 
protocol  with  priority  mode  (P),  and  the  two-phase  locking  protocol  without  priority  mode 
(L),  is  shown  for  transactions  of  different  sizes  with  balanced  workload  and  I/O  bound 
workload.  The  two  important  fectors  affecting  the  performance  of  locking  protocols  are 
their  abilities  to  resolve  the  locking  conflicts  and  to  perform  I/O  and  transactions  in  parallel. 
When  the  transaction  size  is  small,  there  is  little  locking  conflict  and  the  problem  such 
as  deadlock  and  priority  inversion  has  little  effect  on  the  overall  performance  of  a  locking 
protocol.  On  the  other  hatid,  when  the  transaction  size  becomes  large,  the  probability  of 
locking  conflicts  rises  rapidly.  In  feet,  the  probability  of  deadlocks  goes  up  with  the  fourth 
power  of  the  transaction  size  [23].  Hence,  we  would  expect  that  the  performance  of  proto¬ 
cols  will  be  dominated  1^  their  abilities  to  handle  locking  conflicts  when  transaction  size 
is  large. 

As  illustrated  in  Figure  S,  the  performance  of  the  two-phase  locking  protocol,  with  or 
without  priority  assigtunents  to  transactions,  degrades  very  fast  when  transaction  size  in¬ 
creases.  This  can  be  attributed  to  the  inability  of  this  protocol  to  prevent  deadlock  and 
priority  inversions.  On  the  other  hand,  the  priority-ceiling  protocol  handles  locking  con¬ 
flicts  very  well.  The  protocol  performs  much  better  than  the  two-phase  locking  protocol 
when  the  transaction  size  is  large.  The  main  weakness  of  the  priority-ceiling  protocol  is 
its  inability  to  perform  I/O  and  transactions  in  parallel.  For  example,  suppose  that  transac¬ 
tion  Thas  lock  on  Oi  and  it  now  wants  to  lock  data  object  O2.  Unfortunately,  O2  is  not 
in  the  main  memory.  As  a  result,  T  is  suspended.  However,  neither  are  transactions  with 
priorities  lower  than  the  rw-priority  ceiling  of  0\  allowed  to  execute.  This  could  lead  to 
the  idling  of  the  processor  until  either  O2  is  transferred  to  the  main  memory  or  a  transac¬ 
tion  whose  priority  is  hi^r  than  the  rw-priority  ceiling  arrives.  We  refer  this  type  of  block¬ 
ing  as  I/O  blocking.  When  the  transaction  size  is  small,  the  locking  conflict  rate  is  small. 
Hence,  the  two-phase  locking  protocol  performs  well.  However,  due  to  I/O  blocking  the 
throughput  of  the  priority  ceiling  protocol  is  not  as  good  as  that  of  the  two-phase  locking 
protocol,  especially  when  the  workload  is  I/O  bounded. 

Because  I/O  cost  is  one  of  the  k^  parameters  in  determining  performance,  we  have  in¬ 
vestigated  an  approach  to  improve  system  performance  1^  performing  I/O  operation  before 
locking  called  the  intention  HO.  In  the  intention  mode  of  I/O  operation,  the  system  pre¬ 
fetches  data  objects  that  are  in  the  access  lists  of  transactions  submitted  without  locking 
them.  This  approach  will  reduce  the  locking  time  of  data  objects,  resulting  in  higher 
throughput.  As  shown  in  Figure  6,  intention  I/O  Improves  throughput  of  both  the  two-pha.se 
locking  and  the  ceiling  protocol.  However,  improvement  in  the  ceiling  protocol  is  much 
more  significant.  This  is  because  intention  I/O  effectively  solves  the  I/O  blocking  problem 
of  the  priority  ceiling  protocol. 

Another  important  performance  statistics  is  the  percentage  of  deadline  missing  trans¬ 
actions,  since  the  synchronization  protocol  in  real-time  database  systems  must  satisfy  the 
timing  constraint  of  individual  transaction.  In  our  experiments,  each  transaction's  deadline 
is  set  to  proportional  to  its  size  and  system  workload  (number  of  transactions),  and  the 
transaction  with  the  earliest  deadline  is  assigned  the  highest  priority.  As  shown  in  Figure 
7,  the  percentage  of  deadline  missing  transactions  increases  sharply  for  the  two-phase  locking 
protocol  as  the  transaction  size  increases  due  to  its  inability  to  deal  with  deadlock  and  to 
give  preference  to  transactions  with  shorter  deadlines.  TWo-phase  locking  with  priority 
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Figure  5.  Transaction  throughput.  C:  priority _ ^ceiling  protocol.  P:  2-phase  locking  protocol  with  priority  mode. 

L:  2-phase  locking  protocol  without  priority  mode. 
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Figure  6  Transaction  throughput  with  intention  I/Oi 


Figure  7  Percentage  of  missing  deadline. 
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assignment  performs  somewhat  better  because  the  timing  constraints  of  transactions  are 
considered,  although  the  deadlock  and  priority  inversion  problems  still  handicap  its  per¬ 
formance.  The  priority-ceiling  protocol  has  the  best  relative  performance  because  it  ad¬ 
dresses  both  the  deadlock  and  priority  inversion  problem.  A  drawback  of  the  priority-ceiling 
protocol  from  a  practical  viewpoint  is  that  it  needs  knowledge  of  all  transactions  that  will 
be  executed  in  the  future.  This  may  be  a  very  strong  requirement  to  satisfy  in  some 
applications. 

The  priority-ceiling  protocol  takes  a  conservative  approach.  It  is  based  on  two-phase 
locking  and  employs  only  blocking,  but  not  rollback,  to  solve  conflicts.  For  conventional 
database  systems,  it  has  been  shown  that  optimal  performance  may  be  achieved  by  com¬ 
promising  blocking  and  rollback  [24].  For  real-time  database  systems,  we  may  expect  similar 
results.  Aborting  a  few  low  priority  transactions  and  starting  them  later  may  allow  high 
priority  transactions  to  meet  their  deadlines,  resulting  in  improved  system  performance. 
Several  concurrency  control  protocols  based  on  optimistic  approach  have  been  proposed 
[9, 11, 25].  Th^  incorporate  priority-based  conflict  resolution  mechanisms,  such  as  prior¬ 
ity  wait,  that  makes  low-priority  transactions  wait  for  conflicting  high-priority  transactions 
to  complete.  However,  diis  approach  of  detecting  conflicts  during  validation  phase  degrades 
^tem  predictability.  A  transaction  is  detected  as  being  late  when  it  actually  misses  its 
deadline  as  the  transaction  is  only  aborted  in  the  validation  phase. 

4.5.  Multiversion  Database  System 

To  illustrate  the  effectiveness  of  the  prototyping  environment,  we  have  investigated  the  per¬ 
formance  of  a  multiversion  database  ^tem.  There  is  no  correlation  between  the  priority- 
ceiling  protocol  study  and  the  multiversion  database  study. 

In  a  multiversion  database  ^tem,  each  data  object  consists  of  a  number  of  consecutive 
versions.  The  objective  of  using  multiple  versions  in  real-time  database  systems  is  to  in¬ 
crease  the  degree  of  concurrency  and  to  reduce  the  possibility  of  rejecting  user  requests 
by  providing  a  succession  of  views  of  data  objects.  One  of  the  reasons  for  rejecting  a  user 
request  is  that  its  operations  cannot  be  serviced  by  the  system.  For  example,  a  read 
operation  has  to  be  rejected  if  the  value  of  data  object  it  was  supposed  to  read  has  already 
been  overwritten  by  some  other  user  request.  Such  rejections  can  be  avoided  by  keeping  old 
versions  of  each  data  object  so  that  an  appropriate  old  value  can  be  given  to  a  tardy  read 
operation.  In  a  system  with  multiple  versions  of  data,  each  write  operation  on  a  data  object 
produces  a  new  version  instead  of  overwriting  it.  Hence,  for  each  read  operation,  the  system 
selects  an  appropriate  version  to  read,  enjoying  the  flexibility  in  controlling  the  order  of 
read  and  write  operations.  When  a  new  version  is  created,  it  is  uncertified.  Uncertified 
versions  are  prohibited  from  being  read  by  other  transactions  to  guarantee  cascadcd-ab«irt 
free  [26].  A  version  is  certified  at  the  commit  time  of  the  transaction  that  generated  the 
version. 

The  multiversion  database  system  we  have  implemented  is  based  on  timestamp  ordering. 
Each  data  object  is  represented  as  a  list  of  versions,  and  each  version  is  associated  with 
timestamps  for  its  creation  and  the  latest  read,  and  a  valid  bit  to  specify  whether  the  version 
is  certified.  The  multiversion  concurrency  control  scheme  we  have  implemented  is  called  the 
“multiversion  timestamp  ordering  method"  and  is  proved  to  satisfy  the  serializability  [26). 
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Each  transaction  consists  of  read  and  write  requests  for  data  objects.  Read  requests  are 
never  rejected  in  a  multiversion  database  system  if  all  the  versions  are  retained.  A  read 
operation  does  not  necessarily  read  the  latest  committed  version  of  a  data  object.  A  read 
request  is  transformed  to  a  version-read  operation  selecting  an  appropriate  version  to 
read.  The  timestamp  of  a  read  request  in  compared  with  the  write-timestamp  of  the  highest 
available  version.  When  a  read  request  with  timestamp  T  is  sent  to  the  resource  manager, 
the  version  of  a  data  object  with  the  largest  timestamp  less  than  T  is  selected  as  the  value 
to  be  returned.  Figure  8  shows  an  example  of  a  read  operation  with  a  timestamp  “11”. 

The  timestamp  of  a  write  request  is  compared  with  the  read  timestamp  of  the  highest 
version  of  the  data  objea.  A  new  version  with  the  timestamp  greater  than  the  read-timestamp 
of  the  highest  certified  version  is  built  on  the  upper  level,  with  the  valid  bit  reset  to  in¬ 
dicate  that  the  new  version  is  not  certified  yet.  In  order  to  simplify  the  concurrency  control 
mechanism,  we  allow  only  one  temporary  version  for  each  data  object.  Inserting  a  new 
version  in  the  middle  of  existing  valid  versions  is  not  allowed. 

The  experiment  was  conduced  to  measure  the  average  response  time  and  the  number 
of  aborts  for  a  group  of  transactions  running  on  a  multiversion  database  system  and  its 
corresponding  single-version  system.  TW)  groups  of  transactions  with  different  characteristics 
(e.g.,  type  and  number  of  access  to  data  objects)  were  executed  concurrently.  The  objec¬ 
tive  was  to  study  the  sensitivity  of  parameters  on  those  two  performance  measures. 
Here  we  present  our  Endings  briefly. 

Performance  is  highly  dependent  on  the  set  size  of  transactions.  As  shown  in  Figure  9, 
a  multiversion  database  system  outperforms  the  corresponding  single-version  system  for 
the  type  of  workload  under  which  they  are  expected  to  be  beneficial:  a  mix  of  small  update 
transactions  and  larger  read-only  transactions.  The  reason  for  this  is  that,  in  a  multivcr- 
sion  database  system,  read  requests  have  higher  priority  than  the  write  requests,  whereas 
the  priority  for  read  requests  is  not  provided  in  a  single-version  system.  Therefore,  in  a 
single-version  system,  the  probability  of  rejecting  a  read  request  is  equal  to  that  of  a  write 
request.  The  experiment  shows  that  a  single-version  database  system  outperforms  its 
multiversion  counterpart  for  a  different  transaction  mix. 


Figure  S.  A  read  operation  with  two  ceitined  versions  of  a  dau  object. 
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It  was  observed  that  the  performance  of  a  multiversion  system  in  terms  of  the  number  of 
aborts  is  better  than  its  single-version  counterpart  for  a  mix  of  small  update  transactions  and 
laiger  read-only  transactions.  Similar  experiments  have  been  performed  by  changing  the  data¬ 
base  size  and  the  mean  interarrival  time  of  transactions.  It  was  found,  however,  that  the 
main  result  remains  the  same.  From  these  experiments,  it  becomes  clear  that  among  the  four 
variables  we  studied,  the  set  size  of  transactions  is  the  most  sensitive  parameter  for  deter¬ 
mining  the  performance  of  a  multiversion  database  system.  This  experiment  demonstrates 
the  expressive  power  and  performance  evaluation  capability  of  the  prototyping  environment. 

5.  Conclusions 

Prototyping  large  software  systems  is  not  a  new  approach.  However,  methodologies  for 
developing  a  prototyping  environment  for  real-time  database  systems  have  not  been  in¬ 
vestigated  in  depth  in  spite  of  its  potential  benefits.  In  this  paper,  we  have  presented  a  proto¬ 
typing  environment  that  has  been  developed  based  on  the  StarLite  concurrent  program¬ 
ming  kernel  and  message-based  approach  with  modular  building  blocks.  Although  the  com¬ 
plexity  of  a  distributed  database  system  makes  prototyping  difficult,  the  implementation 
has  proven  satisfoctory  for  experimentation  of  design  choices,  different  database  controls 
techniques,  and  even  an  integrated  evaluation  of  database  systems. 

There  are  three  main  goals  to  be  achieved  in  developing  a  prototyping  environment  for 
real-time  database  systems:  modularity,  flexibility,  and  extensibility.  Modularity  enables 
the  environment  to  be  easily  reconfigured  as  any  subset  of  the  available  modules  can  be 
combined  to  produce  a  new  testing  environment. 

An  additional  benefit  of  the  “right”  modularity  is  that  actual  system  software  can  be 
developed  in  the  prototyping  environment  and  then  ported  to  the  target  machine.  This  is 
enabled  by  the  use  of  technology-independent  interfaces  that  are  general  enough  to  support 
any  taiget  system  architecture.  In  addition  to  the  portability,  programs  may  be  run  in  a 
“hybrid”  mode,  that  is,  not  all  service  calls  need  be  simulated.  For  example,  file  system 
calls  in  the  application  program  can  be  intercepted  by  the  interpreter  and  directed  to  the 
existing  host  file  system.  Then,  as  a  file  system  is  developed,  the  file  system  calls  can  be 
directed  to  it.  If  the  file  system  is  not  necessary  or  is  not  the  focus  of  the  current  research, 
it  need  not  be  developed.  This  feature  of  the  prototyping  environment  allows  the  developer 
to  focus  on  only  pertinent  design  issues. 

Flexibility  enables  the  prototyping  environment  to  be  applicable  over  a  wide  range  of 
configurations  and  system  parameters.  One  of  the  keys  to  achieving  this  goal  is  to  design 
interhices  whose  operations  are  independent  both  of  the  implementation  technology  and 
the  context  in  which  they  arc  used.  For  example,  the  user-level  Send  operation  sends  an 
array  of  bytes  to  an  abstract  data  type,  the  PortTag.  Thus  this  operation  can  be  used  to 
send  any  packet  type  to  any  destination,  be  it  local  or  distant. 

The  third  goal  is  that  the  prototyping  environment  be  extensible  enough  to  model  addi¬ 
tional  features  of  particular  systems  by  adding  modules  without  affecting  the  operation  of 
or  requiring  the  recompilation  of  existing  modules.  For  instance,  the  implementation  can 
be  extended  to  model  the  operation  of  different  types  of  I/O  devices  of  different  speeds 
by  modifying  the  implementation  module  that  performs  the  read  and  write  operations.  One 
way  to  modify  the  implementation  would  be  to  del^  for  a  period  depending  on  the  address 
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Figure  9.  Average  transaction-response  time. 
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passed  to  the  read  or  write  operation.  Reading  from  a  disk  might  be  indicated  by  one  range 
of  addresses  and  take  some  time  while  reading  from  a  tape  drive  might  be  indicated  by 
another  range  and  presumably  take  longer.  However,  because  the  interface  of  this  module 
is  device  independent,  changing  the  implementation  to  process  I/O  requests  at  different 
speed  will  not  affect  any  of  the  modules  that  request  I/O  operations.  Therefore,  time  and 
effort  for  system  reconfiguration  can  be  reduced. 

Expressive  power  and  performance  evaluation  capability  of  our  prototyping  environment 
has  been  demonstrated  by  implementing  real-time  database  systems  and  investigating  the 
performance  characteristics  of  the  priority-ceiling  protocol  and  multi  version  databases. 

In  real-time  database  systems,  transactions  must  be  scheduled  to  meet  their  timing  con¬ 
straints.  In  addition,  the  system  should  support  a  predictable  behavior  such  that  the  possibility 
of  missing  deadlines  of  critical  tasks  could  be  informed  ahead  of  time,  before  their  deadlines 
expire.  Priority-ceiling  protocol  is  one  approach  to  achieve  a  high  degree  of  schedulability 
and  system  predictability.  In  this  paper,  we  have  investigated  this  approach  and  compared 
its  performance  with  other  techniques  and  design  choices.  It  is  shown  that  this  technique 
might  be  appropriate  for  real-time  transaction  scheduling  since  it  is  very  stable  over  the 
wide  range  of  transaction  sizes,  and  compared  with  two-phase  locking  protocols,  it  reduces 
the  number  of  deadline-missing  transactions. 

Using  the  prototyping  environment,  we  have  shown  that  in  general,  a  database  system 
with  a  multiversion  concurrency  control  algorithm  performs  better  for  processing  reaJ  re¬ 
quests.  Read  requests  that  would  be  aborted  in  a  single-version  database  system  due  to  con¬ 
flicts  may  be  successfully  processed  in  a  multiversion  system  using  older  versions.  Therefore, 
when  the  read  requests  dominate  the  transaction  load,  and  there  is  a  high  probability  for 
abort  of  read-only  transactions  due  to  conflicts,  a  multiversion  system  outperforms  its  cor¬ 
responding  single-version  system.  The  relative  size  of  the  read  and  write  sets  of  transac¬ 
tions  is  an  important  factor  affecting  the  performance.  Although  the  actual  performance 
figures  will  vary  depending  on  workload  and  implementation  details,  we  believe  that  our 
results  provide  a  good  picture  of  the  costs  and  benefits  associated  with  the  multivcrsion 
approach  to  concurrency  control. 

Real-time  distributed  database  systems  need  further  investigation.  In  priority-ceiling  proto¬ 
col  and  many  other  database  scheduling  algorithms,  preemption  is  usually  not  allowed. 
To  reduce  the  number  of  deadline-missing  transactions,  however,  preemption  may  need 
to  be  considered.  The  preemption  decision  in  a  real-time  database  system  must  be  made 
very  carefully,  and  as  pointed  out  in  [27],  it  should  not  be  necessarily  based  only  on  relative 
deadlines  because  preemption  implies  not  only  that  the  work  done  by  the  preempted  trans¬ 
action  must  be  undone,  but  also  that  later  on,  if  restarted,  must  redo  the  work.  The  resul¬ 
tant  delay  and  the  wasted  execution  may  cause  one  or  both  of  these  transactions,  as  well 
as  other  transaction  to  miss  the  deadlines.  Several  approaches  to  designing  scheduling 
algorithms  for  real-time  transactions  have  been  proposed  [5.  7,  26]  but  their  performance 
in  distributed  environments  is  not  studied.  The  prototyping  environment  described  in  this 
paper  is  an  appropriate  research  vehicle  for  investigating  such  new  techniques  and  scheduling 
algorithms  for  real-time  database  systems. 
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